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ABSTfiiCtr 

This study of the federal governaent's evaluation of 
social procxaas indicates that it is virtually iapossible to 
establish a bias* free* valid « and reliable systea of inquiry to 
deteraine the effects of social prograas. Divided into five chapters, 
the docuaent ezaainea the aspirations and liaitations of evaluations » 
■ethodology, evaluation in the legislative and executive branch, and 
considers the need for continuous assessaent. The authors point out 
that data for describing the nature and extent of social probleas 
provide only rough approxiaations; although aethodologies are 
sophisticated aatheaatically and statistically, results are not 
relevant. Also, federal evaluation efforts suffer fros being 
conducted in an institutional cliaate that abounds vith 
ad aini strati ve barriers and which perpetuates service to political 
interests. Although the SeneraVtecounting Office IGAOU the 
investigative era of Congress, has aaintained its reputation for 
independence, their studies offer little constructive coaaentary on 
ehat is feasible social policy. The congressional Besearch Service 
ICBS} relies upon secondary sources for their evaluations. Finally, 
the reality of the world in which social policies are foraulated and 
ispleaented is the aost serious lisitation of the usefulness cf 
social progras evaluations, social progrias are narrow, arbitrary, 
and fail to take full account of the global nature of aany social 
probless. The authors conclude that the future role of evaluators 
ssess to be that of analysing progras operations rather than passing 
liidgsent on progras iipacts. fAathor/KCi 
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Foreword 



Expanded needs for information that (an be used to measure 
the effects of a broad array of social programs have occurred as a 
result of the federal government's increased soda! policy role 
during the past two decades. During this period, evaluation of the 
efficiency and effectiveness of federal social programs has been 
related increasingly to the policy formulation process. Within the 
federal government, agencies' responsibilities for program 
evaluation have increased rapidly in response to congressional 
requirement for the assessment of the impacts of major sodal 
i^;islative initiatives. 

This study assesses the current state of the art of both process 
and impact evaluation, with emphasis on the limitations of 
evaluation tCK)ls currently in use. Levitan and Wuraburg also have 
provided an in-depth analysis of the federal government's 
institutional arrangements and process for evaluating social 
programs. The need for evaluation and the use of best available 
evaluation methodologies are clearly recognized by the authors. 
However, Levitan and Wurzburg detail serious limitations of both 
evaluation tools and histitutional arrangonents for federal soda! 
program evaluation. The study is published with the expectation 
that the authors' critique of sodal program evaluation will 
contribute to informed discussion and ulthnate improvements in 
program evaluation as an important tool for planning and public 
policy devdopment. 

The facts presented in this study and the observations and 
viewpoints expressed are the sole responsibility of the authors. 
They do not necessarily represent portions of the W. E. Upjohn 
Institute for Employment Research. 

F Earl Wright 
Dh^or 

Kalamazoo, Michigan 
September 1979 
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Preface 



Federal programs in the United States are under continuous 
scnitiny, and some have been subject to sustained attack. 
Advocates of an activi^ federal role are on the defensive, realizing 
that the era of rapid growth of both the economy and soda! 
programs has halted for the time being. Retrenchment in certain 
areas is evoi likely. Although the dtizenry continue to damor for 
new forms of federal assistance, the realities of the vast American 
production machine require at least selective slowdowns in the rate 
of growth. Sound managemoit may even require outright 
curtaihnent of some efforts. If sodety camiot afford to support all 
tl^ existing sodal programs, the challenge before policymakers is 
to select the likdy candidates for cut-backs, and improve those 
that remain. 

In a broad sense, the need is to establish niiiarhanism to permit 
rational allocation of r^urces. There is an inoeasing donand to 
know what past spending has accomplisiud and where future 
spendhig will produce the biggm bang for the buck. Policymakers 
have, therefore, tum«i to the modem day wisonen for hdp in 
divifldng the means to determh» whidi ^og^asas are to be axed 
and whidi should be continued or even expanded. 

There is no record that the founding fathers were directly 
concon^ with problems of evaluating sodal programs— there 
' was predous little to evaluate. But the system they designed is 
admirably suited for obtaining, in the end. suffid«it evidence for 
policymakers to draw informed condudons about the effective- 
ness of federal efforts. The separation of the three brandies of 
government, particularly the cKCutive aod legislative, pnmits, 
indeed encourages, independent evahiation of sodal programs. 
While Congress has bees slow in seizing the opportunity to oHain 
iadepesdent assessment of the programs, it is rai^diy est^disMng 
a network that feeds back information about the impact of the 
multitude of ef foru that it has mandated. The executive branch, in 
even greater i»ed of scrutinizing the opomtion and impact of 

vii 



sodal pro-ams Ixcause it has the direct responsibility for 
implementiiig and administering these efforts, has been «tab- 
lished in the business for years. But the pot^tial role of evaluation 
in a system of checks and balances has not been fully r^Uzed. 

This study examines the tools that evaluators have developed to 
practice their trade and reviews the institutional arrangements that 
have been devised for the care and nourishment of the evaluators. 
The volume also assess^ the comparative strengths and 
weaknesses of the evaluation establishments in the two branches 
of government. It conomtrates esp^ally on the programs 
assigned to the Deparmieat of Health, Edu«^on and Welfue 
and the Department of Labor, the two major federal agencies 
responsible for administering social measures and the principal 
source of insights for policymaking in both branches. 

Arguing that it is better to make decisions on the basis of the 
best available information (although not doiying the dangers of 
only a little knowledge), the authors qu^tion whether the tools 
available to evaluators and the climate in which they work permit 
them to design objective criteria and valid and reliable methods. 
The central issue for scrutiny is whether evaluation as it is 
practiced provides a convincing basis for making decisions about 
social programs. 

Federal programs offer a omiprehensive system of social 
services for Americans. This federal role evolved as a product of a 
normative process, an expression of what ought to be. The authors 
conclude that it is highly questionable whether evaluators can 
provide a new, precise ^culus capable of changing that situation. 
In the final analysis, no matter how sophisticated and careful the 
evaluation of social programs may be, value judgments and 
political considerations remain paramount, controlling the 
decisions made by policymakers. 

But hope springs eternal. Congress keeps asking for more and 
appropriates tiie funds to assure the delivery of new evaluations, 
vSbHc officials in the executive branch pay lip service to the 
evaluation of thdr programs. There is little evidence, however, 
that much use is made of the products. 
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For the purpcse of this study, evaluation is broadly (tefined to 
include the presentation of evidence on program performance and 
its impact. The reader will (^>vioudy note that this definition does 
not attempt to pinpoint what constitutes rdcvam evidence. That's 
why a whole book is needed on the subject, and a definition will 
not do. 

An earlier draft of the study was completed more than a year 
ago but its publication was defwrcd due to pressure of other 
responsibiUtiM. Meanwhile, the draft received rather wide 
circulation among evaluatore— inside and out^de gownment— 
beyond the expectations of the authors. The agonizing attacks 
persuaded us that the study is on the mark and that it might be of 
interest to a wider audi^(». 

We arc grateful to our not v«7 admiring critics. ThcUr attacks 
helped improve our product, but they are, of course, absolved 
from any of the remaining transgrcssicns. In addition to Harold 
Orlans, who offered painstaking amt valuable critidMns and 
suggestions, we are also indebted to He&ry Aaron, Gregory Ahart. 
Burt liamow, Michael Boms, Seymour Brandwcin, Pct» Henle, 
Joseph Might. Robert Levine, Arnold Packer, Howard Rosen, 
Fred Siskind, Ernst Stromsdorfw, aiKl Barry White for their 
helpful written comments. The names of critira supplying oral 
comm«its, no mittter how forcefully presetted, are omitted, 
though thdr obswvations were carefully not«i. The authors also 
thank Nancy Kiefer for scdng the manuscript through its 
itomtions and preparing it for publication. 

This study was prepared under a grant from The Ford 
Foundation to The George WasWngton University's Center for 
Social Policy Stuoies. In accordance with the Foundation's 
practice, responsibility for the content was left completely to the 
authors. 

SarA. Levltan 
Gregory Wurzburg 

Wa^iington, DC 
Sfj^anber 1979 
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1 

The Aspirations and 
Limitations of Evaluators 



In PmtfUIT OF SOUB UNCBSTiUNTy 

Much of our social policy has been undertaken on faith. 
Education, established early as a good that should be available to, 
and indeed forced on, aU children, has yet to prove its intrinsic 
value. Nonetheless, it is considered almost unpatriotic to question 
that commitment. Once social security became law, it successfully 
withstood attacks on its basic underpinnings because of the 
popularity of a nation-wide system of social insurance. The 
validity of a federal role oicouraging home ownership and 
providing shelter for low-income families, also remains strong 
after more than forty years. 

In the 1960s, however, the federal government's role in the 
social policy arena departed dramatically from historical patterns, 
raising questions and demanding justification. The federal 
government was moving into policy areas that traditionally had 
been the domain of ftatei or private organizations. In particular, 
the increased federal role in education and public welfare fparked 
debate on some fundamental questions of federalism. When the 
federal government entered new fields that no level of government 
had seriously considered before, there was further controversy 
about the limits of governmental responsibility. The reactivation 
of New Deal concepts of economic development, community 
dcvdopment» and manpower development after three decades was 
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2 Toe Aspirations aud Limitations of Evaiuators 



viewed by some critics as a dangerous incursion, threatening to 
private sector opportunities. Targeted programs and civil rights 
laws aimed at providing compensatory services to the poor raised 
the ire of people who had mistaken the rhetoric of equal 
opportunity and equal justice for all, for reality. In this charged 
atmosphere of social change, inquiries about which programs 
worked, why they worked, and how diey could be improved were 
inevitable. Advocate of the new federal activism looked to 
evaluations for evidence to defend their case. Hie detractors 
sought ammimition to wage the battle against what they viewed as 
conceptually imsound and administratively unworkable social 
initiatives. 

Ev?n before the social programs of the Great Society moved 
center stage, another essential force was carving out a niche for 
evaluators in the federal bureaucracy. In an attempt to introduce 
some order into the process^ of policy analysis and decision- 
making, Robert McNamara, Secretary for Defense under 
Presidents Kennedy and Johnson, applied a systems approach 
entitled the Programming, Planning, and Budgeting System 
(PPBS) to the review of that agency's missions. Pr^ident Johnson 
then foisted PPBS on other federal agoacies. Evaluation came to 
be viewed as part of the "feedback loop" whose assessments 
would be an essential ingredient of the policy formulation process. 

PPBS was neither as successful nor as popular as its archit«ns 
had hoped it would be. It failed to impose much order on the 
management process and had little apparent effect on improving 
the work of civilian agencies. But it did leave a legacy expressed in 
the ling^ing notion that evaluation, research, and analysis, if 
done correctly, could supply needed guides to policymakers. 
Indeed, the most enthusiastic proponents of PPBS endorsed it as a 
deus ex machina capable even of making decisions for the mortal 
administrator. 

The Apfucation of Evaluation 

The growing need for information about the effects of 
Washington's bold new policies coupled with the magical powers 
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bestowed on evaluation by the PPBS movement significantly 
legitimized the role of evaluation in the policy fonnulation 
process. The evaluation trade has continued to flourish since the 
early 1960s. Most fedwal agencies havkig responsibilities for 
social programs «"ai«*a»n sizeable budgets for program assessment 
activities. Congress writes evaluation requirements into virtually 
all new social legislation. According to one estimate by the Office 
of Management and Budget, about $140 million was spent in 1978 
for evaluating federal social programs. 

The multitude of evaluations are undertaken in the name of 
providmg information to legislators, admini^rators, and the 
^cral public. The main questions they are designed to answer arc 
whether to retain a program, change its level of funding, or 
modify its objective or the strategies for achieving those 
obi«^ves. 

Despite their broadening scope, evaluations have not succeeded 
in dispelling doubts, nor have they lent much order to the process 
by which social policy is formulatsd. Notwithstanding the appeal 
of evaluation to policymakas, the actual connection between 
evaluation and policy formulation ifi attentuated. In the final 
analysis, there is still a mismatch between what evaluators have to 
say, and what policymakers and the public need to know. 

The reasons for the meager payoffs of evaluation are not hard 
to fathom. The most basic defidcncy is shnply that the meaning 
and purpose of evaluation are fuzzy. To paraphrase Alice's Mad 
Hatter— evaluation means exactly what anyone may want it to 
mean. Webster is cotdse in defining evaluation: "to find the value 
or amount of." But this definition provides few guidelines for 
most people who bandy the term to s«vc their own ne^V For the 
government bur«ucrat, the term may triggs- an alarm because 
"evaluation" is pwcdved as a threat to the agency's programs. 
Members of Congress may use the term as a political tool, wlrile 
the practitioner of the trade may perceive evaluation as an excrdsc 
by an hnpartiai observer to pUi insighu about a program. 

Whatever the definition of the term, its meaning is not clarified 
by the attempts of evaluators to impose rationality upon various 
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ai^licatioas of the coni^^ Some evaluations are narrow inqimies 
into programs that have bees hastily designed to respond to 
complex problems and whose impacts cannot be isolated from 
other developments. Furthermore, the laws authorizing social 
programs are filled with conflicting goals and vague speciOcations 
for r/ogram ^ults that are frequently non-ob^vable, non- 
quantifiable, or otho^vise difficult to ^uge. Even where hard 
data can be obtained, standards for judging performance are often 
arbitrary. 

Amassing knowledge to evaluate the efficiency and effectiveness 
of social programs also has inherent drawbacks. Th^e is a 
misi^Uiced oDnHdmce in ti^ assumi^on that sodal problems and 
governmental solutions have a continuity that allows lessons 
gleaned from observing them to be stacked like bricks. Social, 
political, and economic conditions are ever-changing, and 
program responses arc altered almost as frequently. It is 
misleading and presumptuous on the part of sodal scientists to 
create simplistic models of program objective and effons, 
pretending there is order where little exists. The consequences of 
these conceptual shortcommgs to evaluation are predictable and 
dismaying: evaluators encourage, and policymaka^ too readily 
accept, sodal polides predicated on fragmented views of social 
problems and the effect of social change. The polides reflect a 
nMchanical internal consiste!»:y that flies in the fa^ of reason, 
where common sense dictates a holistic and judgmental approach. 

State*of*the>art limitations are not the only restraint on the 
usefulness of social program evaluations. The process is flawed 
frequently by evaluators overselling their products and lu-ging 
excessive reliance on thdr Hndings and conclusions. Policy- 
makers, eager to shift their burden of responsibility to mechanical 
decision models, are only too ready to accept the snake cil 
remedies uncritically. Administrative obstacles also limit the 
potential usefulness of evaluation. Pdicy designers aiui program 
administrators frequently demand instant ass^sments of intricate 
problems without adequately considering the po'spective from 
which evaluations are undertaken or the appropriate means of 
procuring evahiation services. And some evaluators always stand 
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ready to supply quickie products that are inaccurate and cast 
doubt upon the reliabiiity of all evaluators. Related to this is the 
difficulty of securing independoit evaluators who are lulling to 
deliver inclusions based on best available f£^ no mattor how 
unfavorable the m^sage may be to the sponsors. 

llie limited effectiveness of evaluation has prompted some 
predictable reactions. Congress interprets these shortcomings as 
arising from administrators' reluctance to use the available tools. 
Consequently, it mandates more evaluations with each passing 
year, shifting the requirements as to who should do the evaluation 
and how they should do it. Meanwhile, evaluation administrators 
often pursue their own ag^das unmindful of larger program 
objective and agency missions. Other managers, lamenting what 
they per<^ve to be m^er ^ns from evaluation activities, 
attempt to improve the yield by tinkering with the way evaluation 
is manage. In the mid-1970s, the Office of Management and 
Budget (OMB) tried to ^tablish some norms for administering 
evaluation. OMB was no more successful than others in cl^ly 
defming the s^pe of evaluation, and its muscle in the executive 
^tablishment offfied a poor substitute for intellectual- substano;. 
Not to be deterred by OMB's failure, the General Accounting 
Office began in 1976 a more mod^ project that was still aimed at 
standardizing management of evaluation— first in the Department 
of Housing and Urban Development and then in the Department 
of Labor. Its early results look no more promi^ng than those of 
OMB. 

Legislators and administrators have been joined in the general 
attempt to improve the productivity and end product of 
evaluators. Academics and technicians have intensified their 
search for the Holy Grail, attempting to invigorate the evaluation 
trade by upgrading methodologies. In their labors, they try to 
identify critical variable, fashion control groups, and collect data 
that win provide answers to questions raised in the search for 
solutions. But while they have been improving the sophistication 
and elegance of their tools, the new methodologies appear as 
flawed as simple efforts of the past. The added mountains of data 
are more neatly arranged and tabulated, but de^nitive 
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icfonnation on bothersome policy issues fonains elusive. Social 
inquiry is being steered, increasingly, by the methodological tools 
available, not by the fundamental questions being asked. Where 
the methodologies should be providing broader, more balanced 
pictures, they are fooising more and more on the h^agments that 
yield to quantitative measures and analysis, inducing policymakers 
to focus theu- attention on the trees, and lose ^t of the forest. 
Neither do the new methodologies provide better answers to the 
central questions of what evaluation should accomplish and what 
role it rightfully d^rves in policy des^^ and formulation. Th^ 
is a surreal quality to these efforts as the social scientists attempt 
to impi ove even more their techniques for counting the number of 
angels that can dance on the head of a pin. 

FIaws in the Foundation 

The development of social program evaluation resembles an 
evangelical movement. The surge in social spending in the 1960s 
spawned the conditions and generated support for evaluations. It 
may well be that the only way evaluation could gain recognition as 
a major policy tool was by ^ming hi with gr^ fanfare. But 
money alone do^ not create ideas— the movement lacked the glue 
of rationality from the beginning. Consequently, the legacy of that 
dramatic entrance is a weak foundation buckling under the 
pressure of proponents attempting indi^riminately to push 
evaluation beyond its capabilitit^. 

The foundation for the evaluation of social programs was 
weakened from the beginning by a number of flawed assumptions. 
The most deficient was held by social scientists who asserted that 
systematic inquiry could successfully disaggreji^ social problems 
into discrete packages treatable with individual policy prescrip- 
tions. Another underlying weakness was the belief held by sockl 
scientists and evaluators that if more accurate and reliable 
information were available, l^islators and administrators would 
use it as a basis for enacting and unplmnenting l^isktion. Third, 
evaluators, trusting the power of their tools, presumed thmselves 
able to identify important program variables in a real-life setting 
and to isolate and describe changes induced by policy. In short, all 
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thrrc flaws have reflected pervert; positivism~-i^haps an 
im. itable by-product of the success of the scientific method in 
pushing technology. Their impact has been to upset the balance 
between subjective judgment and empirical obsovation, with 
more reliance placed on the latter and Ins on the former. 

The evaluation establishment has also fallen prey to serious 
shortcomings. Because of noncomparability among efforts, 
isolated and frequently conflicting findings from individual 
programs cannot be cumulate or woven together well enough to 
form the solid base of experience advocated by evaluation 
promoters. Differing assumptions have also muddied the waters. 
Instead of the saisidvity needed to trace the development of social 
science knowledge from disparate pieces of information, there 
have been impatience and immod^t expectations.' 

T9E LniTTED Potential 

Flawed assumptions have severely diminished the utility of 
evaluation as a tool for policy analysis. But these same 
assumptions form the basis for a gn^t d«il of today's social 
science. Therefore, a critique of the federal government's social 
program evaluation policy confined to that approu± is doomed to 
an uphill struggle and runs the risk of stressing issues not central to 
evaluation. 

It is valuable instead to focus spedHcally on the medianisms 
and institutional arrangements for evaluation. That is where it is 
most clearly identified as a "discipline** separate from social 
science. That is where evaluation is most susceptible to change. 
And that is where our critique can be related to the broader, more 
systemic features of social inquiry. 

The authors are not arguing that evaluation fails to teach any 
lessons. But there are limits to what even the most conclusive 
findings can imply for future policy. When those limits are 
ignored, the potential for abuse is enormous, the possible 
cons^uence costly, and misguided conclusions may put in 
jmpardy the stability of society. 

!. Henry J. Aaron, PoiUUs ojuJ the Prtfftasm (Wadjisfton: The Brooldap 
totftntkm. 1978). pp. 146-167. 
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Statb or iHB Ait 

Evaluating the fednal £ovenunent*s social li^tiatives is more art 
than science. "Hie programs, even in their conception, are 
imprecise tools for intervention. They frequently approach the 
problems they are meant to solve from an ol'lique angle, and 
provide only partial solutions. Valid and rea!ntic standards for 
judging them are critically lacking. Implemented hi an 
environment chaiged with enuitional and p<^cal disagr^moit 
and subject to a number of uncontrolh^le variably the programs 
defy careful and systematic evaluation. 

The tools available for evaluating the progruns are crude. In the 
words of one federal evaluation admhiistrator, "We are ahnost 
pre-Copemican in our understanding of the social science 
methodology in this area.'" The conditions under which 
evahiators operate make even the semblance of a scientific 
methodology impra^icaL Evaluators axe rarely in the position to 
design programs hi mdi a way as to provide «dequate tests for 
hnportant ekiMnts. Oaims about the ability to observe 
mults— directly or through proxy measures— are frequently 
I^etentious. The nodon of achkving ocperimental conditions, so 

I, WiSiiffiMorriU. OovtrHmau&!oitomyatdSpt)idiMtR^0mAaqfl976, Htwinii 
bffim VS. CgBfRSM, Stam Oma^am oa Qomtaam Opowiaiii (Waihiiiftoo: 
Govavant Prteii« Qtfkt, 1976), pp. 44^4«5. 
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central to scientific inquiry, is a virtually unattainable ideal. An 
enormous number of variable inifluendng any program are 
beyond the control of even the most imaginative and powerful 
policymaker, to say nothing of a lowly evaluator. Yet, in spite of 
the difficulties of even crude scientiHc inquiry, policymakers 
demand assessments of social programs, and cvaluators respond, 
cranking them out with a veng^ce. 

Hoping to bring to their prof^ions a degree of O'edibility that 
the substance of thdr work som^imes fails to provide, evaluatoi^ 
have collected an impr^ive array of tools for their trade. 
Although the collection may seem to border on witchcraft, it is 
intended to impart a sea^ of rigorous and syst«natic inquiry. 

Ultimately, the methodology that evaluators adopt is shaped by 
the decisionmakers' inquiri^ and the likely applications they will 
make of the fmdings. At different levels of decisionmaking, 
different issues assume prominence. Program operators zero in on 
program management and logistics, while the bottom line for 
legislators is the difference a program makes. Between these two 
extremes are many intermediate interests and d^ion points that 
reflect complexities of the social programs themselves, and of their 
interaction with the world. 

Although the questions of decisionmak^^ do not fall into neat 
pigeonholes, and the evaluations designed to provide the answm 
are often less than tidy, the genial methodological approaches 
that form the basis of any inquiry can be broken into two genres: 
pro(»ss evaluations and impact evaluations. The first assises 
whether a program is a workable tool for change, and the second 
assesses the effects of a program in achieving the desired change. 
The focuses of the two approaches differ dramatically, as do their 
respective methoddogical strategic. 

Assming Workability: Process Evaluations 

Whether a program effectively attacks a sodai problem is of 
secondary concern to the administrator whose prime task is to 
implement the program. Will unemployed veterans enroll in 
training course for auto mechanics? Who will seek s^ces from 
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a community health center and what number of clients will strain 
existing fadlities? In assessing workability, the evaluator focuses 
on how faithfully intervention tools implied in policy mandates 
have been implemented. 

Hie scope of a proc^ evaluation is conHned to assessing what a 
particular program has accomplished in meeting its immediate 
objective, and assessing the "workability" of a program. Pro^s 
evaluations take as their starting point the presumption that a 
program is conceptually sound, and focus on the evaluation of 
^'effort,*' including administrative practices, ^fmg pauems, 
caseloads, and unit costs.' Without such evidence, it is difficult for 
policymakers and oth^ evaluation users to distinguish between 
failures caused by process insuffidendes (lack of train«i staff or 
inappropriate target groups, for ocample) and impact-related 
program design flaws (e.g., no rdation between training and 
employment). Edward Suchman identified four dimensions of 
sodal intervtmtion strategies more suited for sqjarate process 
evaluation than for analysis of program impact. ' 

Operational elements are critical first determinants of success or 
failure. Dedsionmaking structures, political interactions, staff 
competence, the physical condition of facilities, financial 
management practices, and the level of supportive services all play 
a part in how effectively a program concept is implenented. 
Although the con^n is not with m^uring program results, 
proc^ evaluiUions can imply as much about program suc4%ss as 
can impact studies. Operational dements are as impoitant an 
influence as program strategy in the pursuit of immediate program 
objectives and long range results. An analysis of the public 
Employment Service foimd personal leadei^p and political 
support to have important effects on tl» swxess of the agency's 
programs.' Studies of implementation of the Comprehensive 

2. JoKi^ S. Wboley. 0t oL, Ftdml Evahtation Polky (WaiMti^: Tbe UrtMn 
latititiite, 1973), pp. 95^. 

i. Eihrard Sadusco, Bmluailn RtmKh: Pria^pfa end Practkts In Putlk Scrvte 
Sodei Action Progmms (fitm Yfxk: RusmU Sue F<mndttioa. 1967), pp. 64-69. 

4. Mark L. cauKfwio, tt id., Tht Employmnt Servieg: An Institutional Analysis 
(Wuhinfton: Oovemuent Priatinj Office, 1977), 
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Eapioymeni and TraUuag Act and of local management of 
employment and training programs have shown clear links 
between operations and program results.' 

It is also important to Jmow precisely who is being served. This 
is especially significant in the case of most federal sodal programs 
because the intended beneficiari^ of the programs normally have 
little political clout; conscquratly, there has always bc«i the 
concern that persons «ith less need would benefit from the 
programs. Since resources are limited, administrators must 
establish priorities and choose to serve only part of the target 
population. "Disadvantaged youth," for instance, is a i^up that 
encompasses more p»sons than Congress ever envisioned in 
designing the several youth programs to s^e thwn. The dioice of 
which subsections to serve is an important detominant of what a 
program does, how it operates, and what it fmally achieves. The 
evaluator has to spedfy, even if the administrator has failed to do 
so. who is being served by a particular program. To do otherwise 
may hinder an arcurate mtcrprctation of tfic results. 

The thkd dimension of proass evduation concerns the 
environmental conditions under which a program operate. 
Knowledge about them is useful for determining whether or not 
the lessons learned from a program are transferable. Generaliza- 
tions about what does and docs not work must be ma<te with an 
understanding of the qualities that are inherent in a pro-am 
design as contrasted with outside factors that impact upon the 
program. 

The final -''mension of process evaluation is the determination 
of whether a program succeeds as an intervention tool in achieving 
its immediate objectives. Do those who complete a training 
program acquire a saleable skill? Do rehabilitation programs leave 
enroQees in a position to compete in the job market? Assessing 
these proidmate results, however, is not the same as asscsdng 
effectiveness. For example, the Kementary and Secondary 
Education Act of 1965 succeeded in channeling more education 

S. Rind«a Ripley, CETA Prime Sponsor \(aiu>Sffi*eM Pecisioiu and Progrm Ooo/ 
Adiitvtmatt (Wuhinfton: Govmmust Prituing Office. Iir78), 
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funds to schools with high conoentratioos of children from 
low-income families and in boosting outlays per low-income pupil. 
But the effective!^ of that strata in equalizing education 
opportunities or in equalizing academic achievement for 
low-income ^oups remains unproven.* Similarly, the Youth 
Employment and Demonstration Projects Act of 1977 succeeded 
m prodding local manpower and education admhiisuators to work 
on joint projects, but it is still imclear whether increased 
owpa^tion has brought about increased empIo)^bitity opportun- 
ities for youth.^ 

Gauging R^tlts: Impact Evaiuatkfn 

In contrast to the narrow scope of process evaluation, program 
impact evaluation attempts to examine accomplkhments and 
therefore has breeder appeal. 

Underlying the analysis of program impact is the hope that the 
inv^tigators can fmd out which tactics work and how successful 
the programs utilizing them have been. In a controlled 
environment whae the only variable would be the program being 
tested, isolating the different^ attributable to it would be a simple 
task. In the case of employment and training programs designed to 
increase employability and raise earnings, comparisons could 
pr^umaUy be drawn between employment rat^ and income 
levels before and after the program. To determme if education 
finance programs had equalled per pupil expenditures, ^ucators 
might look for increases in academic adiievement among students 
in poorer schools. 

Tl» environment in which the evaluator of social programs 
functions, however, is not controlled. Unknown, uncontrollable, 
and unpredictable forces are at work. Trying to estimate what 
would have faappmed in the absence of the program and inferring 
its impact are frustrating experiences. In the employment and 
training arena, changes over time in labor (kmand, wage levels, 

6. Sir A. LevitaQ sad Robot Tsggirt, The Promi$e <^ Qmtims {Ctabtagg, MA: 
tUavKi UnimiUy Pmi, 1976). p. U9. 

7. Qmoty WuKlbiirs. "Injprovinf Jc^ OppntuBitki for Youth," Nttkml Couodl 
00 Emptoymeu Polky, t978, pp. 4M9. 
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and the structural composition of the work foro; can distort or 
mask the effects of government intervention. Eligibility criteria 
and benefit levels, which vary widely among states, influence 
client response to changes in welfare programs. Measurement 
problems also cause difficulties; valid, reliable, and accurate social 
indicators are rare, and precision is illusory much of the time. 

In real life, measures of the effects of social programs are 
relative, not absolute. Tlieevaluator*s only choice is to control the 
nonprogram variables as much as possible in attempting to isolate 
the change caused by the program. The designs that have been 
developed vary in their complexity, the pr^tmiption bdng that 
greate sophistication of the research design improves the chances 
of isolating and estimating the impacts that programs have on 
clioit groups.' 

More sophisticate deigns are inevitably more costly though, in 
terms of time, money, and flexibility. Steep tradeoffs and 
relatively low marginal yields on incr^isingly sophisticated 
evaluative designs make the most sophistirated modes the 
ej^ption r^her than the rule. 

It's Not That Sams 

Evaluators might reach a consensus about objectives of 
different approaches to solving social problems. But in the process 
of designing even the simplest evaluation, the choice of variables 
to be tested, participants to be observed, and indicators to employ 
as measures of progress, are explicitly or implicitly a matter of 
judgment. Yet those subjective judgmeits determine the kind of 
data that will be collected, and affect the conclusions that may be 
derived from them. Not merely inddentai qualifications attached 
to certain conclusions, these subjective judgments impose 
fundamental constndnts on what can be inferred validly from the 
e^ence produ^ by evaluations. 

Evaluation methodolo^ts are fond of asserting that the 
barriers to objective evaluations could be removed if only there 



S. Suchnan, op. dt. 
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were dearer spedncatios of goals and objectives, improved access 
to available data, and a larger role for evaiuators in program 
design and admini^ation. Hie U.S. General Ao^ounting Ofr«» 
^ticized the evaluation system in the Department of Housing and 
Urban Devdopment because policy officials were not effectively 
communicating program goals to evaiuators, and because the 
goals were not quantified.' Too frequently, evaiuators have 
stubbornly refused to recognize or acknowledge that methodol- 
ogical and administrative impediments to evaluation are not a 
function of intractable program operators or l^slative vagaries 
but, rather, are intimately linked to the very nature of social 
programs. 

Usually, neither the laws authorizing national serial programs 
nor the agencies charge! with implementing them are explicit 
about what they want to do. The old adage, "when you don't 
know where you*re going, any road wiU get you there," applies 
frequently in this sphere because lawmakers find it a>nvenient not 
to spell out their r^ intentions. 

Title XX of the Social Security Act is a case in point. The Act 
authorizes grants to states for social services "fto encourage] each 
state ... to furnish services directed to the goal of adiieving or 
maintaining economic self-support to prevent, reduce, or 
eliminate dependoicy. . . The le^slation offers no guidance as 
to who the legislation is aimed at, what outputs might be expected, 
or what the hitermediate objectives might be. Little more is 
offered to the project dKlgnors. Indeed, the evaluator would have 
a difficult time determining reasonable criteria for judging 
achievement or output. Liberals found the notion of federal 
support for social s«vices attractive. Conservatives got the 
language they favored, and none of the parties involved seemed to 
have any qualms about saddling Uncle Sam with the payments for 
virimis sodal services. None championed spelling out the 
appropriate services and client eligibility crit^ia. 

There are obvious cases where objectives are intentionally made 
ambiguous in order to draw broad and muUifaceted support for 

9. CoiQptit^ Ocaosl of the Uohed &Ues, HUD's EvaiueUtm Sjmim: An 
AmmmM (Wiiiiii«u»: Qcoenl Accounting OfTiee, 1971), pp. li. 39. 
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the le^iation. Various groups have often supported the same 
broad goals for different, and sometimes even confUctiog, 
reasons. Charles L. Schultze dtes the ecmentary and Secondary 
Education Act of 1965 as an example of "legislative pluralism" in 
which a law was able to pass because the objectives were 
sufficiently vague to attract the support of diverse intcr^t groups. 
According to his analysis, this law gained the support of a 
coalition of three groups that, separately, could not have exerted 
sufficient pressure to achieve passage: parochial school interests, 
advocates of federal aid to education, and antipoverty warriors. 
To avoid offending any partner, the designers of the l^islation 
used suffidcntiy nebulous objectives to attract maximum support 
without alienating any participating group." The law declared 
that" "The policy of the United Stat« [is] to provide fmancial 
assistance ... to meeting spedal educational needs of educa- 
tionally deprived children." Any evaluator would fmd it difficult 
to gauge the performance of a program in terms of the 
"objectives" of the Act. Given this noble purpose, there would 
also be very little that any legislator's constituents would fmd 
particularly offensive. However, it would also offer very little to 
the evaluator trying to establish more scientific objectives as a 
basis for assessing program performance. 

Even where evaluators find themselves reviewing programs with 
well-specified objectives and quantifiable gcmis and their 
immediate tasks become somewhat easier, the challenge to do a 
worthwhile assessment of a particular program and to have an 
impact on policy does not diminish. For example, the Youth 
Employment and Demonstration Projects Act of 1977 spelled out 
legislative objectives hi great detail. Those objectives wwe 
augmented by the Departmtat of Labor. However, whether 
federal officials will be hi a position to evaluate the program with 
any greater precision than if the laws were vague about its 
objective is no certahity. Among other factors, demographic 
changes in the 1980s, or the renewal of a military draft may 



10. Cb«rJa L. Schultie, The PolUks and Economics of Public Spending (Wwhiniton: 
The Brooking! InrtituUon, 1968), pp. 47-49. 
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completely overhaul programs un4cr the Act and, consequently, 
altar the task of the evaiuators." 

Goals, objectives, and purposes that are articulated in 
legisladon do not represent inviolable verities, but mady a 
consensus of what is desirable and feasible at one point in time. 
The social environment, percei^ons of needs, and impacts of the 
proi^ams are continually changing. Numerous extmiai forc» are 
not foreseeable at the le^slative stage. "Evaluators with thdr 
attention concentrated on declared objectives may overlook 
unexpected results of «)uai or g»cater value."*' Furthermore, neat 
statements of objectives do not ahvays reflect the priorities and 
policies of a federal agency or of l^dative sponsors. Hidden 
agendas and non-spcdficd political objectives are often as high a 
priority to dedsiommUcers as stated de^ objectives. In addition, 
spillover eHiects and unantidpated interactions and developments 
in the program environment may significantly affect what the 
programs a^ropUsh, as well as the implicat' ons of particular 
progiam results." 

To the extent that evalustors conAne their perspectives to these 
narrow objectives stated for public consumption and for history to 
record, they can lay claim to building a degree of objectivity into 
their work. It b almost cartahx that a like-minded evaluator, 
guid^ by the spirit of scientific method, could replicate the 
approach, but the cost would be steep. Their revisions are doomed 
to be sterile works marked by slavi^ internal consistency, but with 
Uttle relevance to the needs of policymakers having to make 
choices in the real world. 

A potentially valuable alternative is a mudi more problematical 
approach in whidi evaluators' perceptions of the program goals 



II. Officr of Youdi Profrtmc, "A Pisimiiif Chirter fw Ute Youth Emidoymau sad 
DMwmintfioo l^jccu Act of 1977," THfUtxam of Uter, Ausutt I^. 

12. 0«rtfa Mususi aad Joha Wtlih. EmpioymmJ »fid Tmining Progrsmtfor Ytmtk: 
WIM Work! Bnt fop (fihOMffWubbipaa: Ckwcraraeot Piintiiif Offioe. ISfWu p. 53. 

11. Ckrai H. Wtiis. "WlmPolitka utf evikiadoa iUiMK^ 
I, No. 3, 1973; ud MiAad & Bonu lod WiiUtm R. Tub, Utaiuriitg tkf IiiqxKi of 
Matipamt Pn^nmu: A A«Mr (Aaa Arbor. Imdnttt sa Ubot lod Indu^iial Rdatlou, 
t/aivtnky of Midi2|Ui-W«yiM Sutc Univcnity, 1970). 
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would replace the formal legislative intent. A useful and intelligent 
evaluation of the impact of social programs has to be more than a 
mechanical comparison of what was formally attempts! and 
actually acoimplished. Moreover, insistence on methoidological 
rigor may exdude information that cannot be incorporated in a 
tidy res^ch design but that is vital to understanding policy issues. 
Evaluation research should be flexible enough to permit 
redefinition of the objectives and issues it addresses. Social 
pro^-ams function in changing environments, and the standards 
used for judging them should reflect these change. Granted, 
evaluation may become more art than sdoice, but, more 
important, it would be«)me a better mirror of r^ty. 

What a an AppiopmiAis Indicator? 

The selection of indicators for measuring the accomplishments 
of social progimms can be another si^flcant source of distortion. 
This proc^ is intrinsic to the choice of goals and objectives, and 
no less vulnerable to institutional and state-of-the-art constraints. 
In the jargon of the trade, indicators must meet tests of validity 
and reliability and they should reflect substantive change caused 
by the program being evaluated.'* The problems associated with 
the unemployment rate and crime index illuminate the difficulties 
of mmuring different aspects o*" & xdal well-being. 

The unmployment rate is a mainstay for identifying labor 
market pathologies and economic malaise. Yet it is a misleading 
indicator of social and economic welfare because it does not fully 
reflect the problems of underemployment and inadequate earnings 
that are at least as relevant as unemployment in a^rtaining 
overall social and economic welfare.'* Even as a limited purpose 
indiouor, however, it is deficient. Measurement problems plague 
the current standards and frequently they are neither accurate nor 
reliable.'* 

14. Sitchmin, cp. dt. 

IS.Su A. Levittn tnd Robert T«tpn, EmphymeM and EanUn^ I na d tquacy : A Ntw 
Social ladiaaor {Bildinare: The Johoi Hopkins Univenity Press, 1974). 

16. The N«tiooal Commiuion on Employment and UiKmidoynKat Statlstict. Coutuint 
tht Labor Fom, « draft prepared for puMk conunent (WasMngson: Oovenuoeot Printing 
Office, 1979). Ch. 3. 
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The U.S. mme index suffers from lack of ^edibility because it 
does not reflect the public's n(^on of <?ime. It is also vuhierable 
to variations in unst«n(f«rdi7ed data «}Uection procedures. Failing 
to wdght for different kinds of crime, and going for more than 44 
years without adjustments for dmnges in d^ographics or 
changes in pric^, tht index has outUv^i any usefdness it may 
have had. 

Most critic indicators used in evaluation are less global in their 
concerns than unemploymmt measures or the crime iadex. They 
still fall victim, however, to some Haws. More directly 
program-related is the r^oit Dqjartment of Health, Education 
and Welfare campaign to oicourage motorcyclists to wear safety 
helmets. The measure of pro-am effectiveness sel«:t«i was the 
cban^ in the number of motorcyclist fatalities. Analysts hoped to 
establish a causal Unk betwe^ the number of motorcyclists 
wearing hehnets and the number of lives saved." However, HEW 
evaluators attributed all of the subsequent drop in motorcyclists* 
fatalities to the program and conveniently di^egard^ factors 
unrelated to the safety program that might have contributed to the 
decUxe in fataiiti^. 

In the Department of Labor's Employment and Training 
Administration, officials adopted 13 indicators to ass^ the 
ongoing effectiveness of local program administrators' per- 
formance. The quarterly indicators measured p^formance with 
respect to unit cost, proportion of participants placed in 
unsubsidized jobs, and post-program earnings. It was presumed 
that a high score would correlate with positive long term program 
effe^is such as increased earning power and enhanced employ- 
ability for participants. While this choice of indicators is app^ling 
as a measure of good management, it does not appear to be a very 
goodpredictor of program impact. Analy»$ has shown there to be 
very little relationship between what the indicators show a 



17. Judith Ana (kNeufville, Social Jndkaton end Sodai Folky (New York: Eluvier 
Sdeatifk; PublUttias Co., 197S), pp. lOMI?. 

iS. Hariey HiflrichsaiidOfaenieTiytor. Anc^rmmfiu^^ 
(Padfle PafiMdo. CA: Goodyeir Publiihiflf Co., 1969), pp. 240-254. 



20 Tools of the Trade 



program to be doing, and the Hnal impacts of the program.'* Lack 
of uniformity among the {procedures used by diff^ent local 
administrators to record data also undomines the accuracy and 
reliability of the hiformation upon whidi the performance 
indicators are based. 

The Work Incentive Pr<^ram (WIN), established to help adults 
receiving Aid to Families with Dependent Children, serves as a 
useful example. Labor's Employmoit and Training Administra- 
tion announces periodically WIN's achievonents, as it did in the 
announcement of December 21. 1978 that the "WIN program 
placed 300,000 welfa e redpients in jobs last year." The 
announcement went on to claim that "savings to the taxpayers 
amounted to well over double the cost of the WIN program." TMs 
pre-Christmas announcemoit must have been ad^uate cause to 
raise the Yul^de spirit of all readers. The trouble is that the 
"analysis" fail^ to <»11 to the attention of the readers that welfare 
recipients have traditionally moved in and out of the AFDC 
program and that most of the 300,000 placed (assuming the 
statistic arc correct) would have left the rolls in the absoice of 
that support. 

On the other hand, evaluating a program with inappropriate 
measures may (»ndfinn an effort that is achieving legitimate and 
desirable results. When the General Accounting Office (OAO) 
evaluated the Ndghborhood Youth Corps, it concluded that the 
program was ineffective bet^use it failed to reduce the school 
dropout rate for disadvantaged youth. However, the dropout rate 
is a function of far deeper social forces and systemic shortcomings 
that NYC could address alone." The GAO evaluation ahnost 
certainly would have been more useful if ft had examine the 
income transfer spillover effects of the youth program on poor 
families. 

Early evaluations of the countercyclical public service jobs 
created under the Comprehensive Employment and Trainhig Act 

19. Sobcn S. Gay and Mkhad E. Bonn, "Valididns Perfonnance Indkaton for 
CETA," Emptoyineia and Traiaiai AdmioirtratlMi, U.S. Depamient of Labor, 1978. 

20. General AccouaUof Office, Fedtnd Manpower TrsiMini Programa-^ftnehisUms 
vtd OtaetyatkMS (Washiiifton: Genenl Accouttting Offke, 19T7). 
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of 1973 harshly d«iounc»d the extent of **fiscal substitution.'* 
Evaluators stated that the net job creation effect of the public 
service job programs was minimal because local governments were 
laying off municipal employees one day and hiring them with 
federal manpower dollars the next. Subsequent studies, however, 
showed a different picture, ^thout the job creation funds, local 
governments would have been forced to lay off workers and cut 
municipal s^vices.^' While the simple before-after analysis of the 
impact of job creation measure show«i a grratly diminished 
impact on local unemployment, the more sophisticated analy^s 
acoiunting for what mtidd have happened without the measures 
indicat«i two positive accomplishments: first, the programs were 
having a larger impact on total employment than had been 
previously suspected, and i^cond, the job option measures were 
proving to be very effective a>untercyclical revenue sharing tools. 
The latter is valuable on a more abstract level as well in that it 
recognij^ secondary impacts beyond the oriHual (^gn 
objectives of the programs that nonetheless pr< /ed to be 
important. In short, evaluations assessing the mei:»s of the job 
m^ure strictly in terms of its stated objectives may be 
methodologically proper but are often misleading indicators of the 
accomptishm^ts of the programs. 

A very serious risk associated with the use of any indicator is 
that analysts may accept without questioning the story they are 
purported to tell. Even if indicators are valid, accurate, and 
reliable proxies for the variables that need to be examined, 
correlations do not necessarily imply causation bemuse indicators 
also must have a finite number of components if they are to 
remain consistent and useful over time. However, the cost of that 
consistency is high where variables that are either hard to measure 
or only occasionally important have been omitted for the sake of 
administrative convenience or to save costs. The alternative of 
partial measurements is often justiHed on the grounds that they 
provide some enlightment. "The problem, of course, is that a little 



2t. Richard P. Natbu, Kobm F. Code, Janet M. Gakhkk, Ridiard W. Loof, and 
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bit of the truth is sometimes taken for the whole—ami half-right 
analysis can be worse than none."" 

EnscnvB Cqupabbd to What? 

Even if useful indicators were easy to choose, and goals and 
c^jectives had been deariy establi^, a nagging question would 
remain: "What would have happened in the atuence of the 
intervention complex?" Evaluation draigns must fully assess 
"before" and "after** when measuring the net effects of social 
programs. 

The biggest obstacle faced in this aspect of the cvaluator's task 
is the difficulty of securing appropriate control groups. The 
overall objective is to ^uge hnpact by comparing the performance 
of program completers to how weU they would have fared in the 
absence of the proi^m. Since it is imp<^ble to measure both the 
effects of partidpation ami nonpartidpation on the same 
individual, evaiuatcHrs try to find a group of persons shnilar to 
program partidpants in every way except ^tli respect to program 
cnrolUnent The hope, then, is that any post-program differences 
observed between tl» control group and program partidpants can 
be attributed to the effects of program partidpadon. 

There are major impediments to obtaining appropriate control 
groups, though. The evaluators «!counter thdr tirst challenges in 
finding oiough persons with comparable characteristics, setting 
up the group, and keeping it intact. Evaluators studying a series of 
rural midwestem youth projects were unable to assonble a control 
group because, once the projects were implemented, there were 
too few nonpartidpating Indians remaining for a satisfactory 
control group. Studies attempthig to mnsure the effect that youth 
employment i^ograms of the late 1960s had on school retention 
w^ plagued by an inabO^ of the evaluators to assenble a 
control group comparable to the program partidpants with 
respect to important sodoeconomic variabtes. 
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Control group methodoiogi^ are most su%eptib!e to the 
probi^ns assodated with comparability assumptions. A longi- 
tudinal study attemi^ing to measure the duration of gains 
experienced by Job Corps participants, relied on "no-shows"— 
persons who were accepted for the Job Corps but failed to enroll 
at a center— as a substitute for a genuine control group. By doing 
so, evaluators undermined the impact of their conclusions with 
doubts as to whether the no-sho\ra were comparable to 
participants, or instead, faUed to participate in Job Corps because 
they did not need the Job Cotps^ servi^." In anoth^ follow-up 
study that sought to measure earning gains of trainee in 
employment and training programs, the evaluator constructed a 
control group using records of persons from the general working 
population who were part of the Skicial Security Administration's 
continuous work history sample.*' The validity of this approach 
was quKtion«i bccau% Social Security data do not capture critical 
differences between the general working population and persons 
from poverty populations participating in the training programs. 
Using the continuous work history sample as a control, all the 
evaluator could validly show was "... how the changes in the 
earnings of MDTA trainee compare with those of the average 
worker. Since most trainees come from disadvantaged Imck- 
grounds, it is unreasonable to expect their earnings to rise by as 
much as the average worker. . . 

There are also ethical problems associated with the use of 
control groups. The idea underlying their use is that they are 
equivalent in every way to the experimental groups except that 
they do not take part hi the program being eva!uat«i. The best 
way to assure equivalence is to randomly select members of both 
the control and experimental groups from the same population. 
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Two ctMcal questions arise: Is it fair to «dude any person from 
participation in a program? Evwi if exdusion is justified because 
limited resources restrirt the number of persons that can be served. 
Is it Just to select that person in a rai^m, arbitrary fashion? 

In medical research, whea new drup and medical procedures 
are being tested, the require ment for strict experimental conditions 
and control groups is commonly accqjted. But in the early stages, 
where the potential risks are greatest, animals are used. Only when 
the obvious risks detectable in animals are first removed is the use 
of human control groups in a clinical setting generally accepted." 
In contrast, use of control groups in social experim«itation and 
program evaluation has been able to require sudi a high degree 
of acceptance. People have to be used from the outset, when the 
unknowns are greatet, and oftai with no more Justification than 
a hunch or political pressure to ^«npt any solution. Sodal 
cxpmmentation is disadvantaged also because it cannot be carried 
out in a protected, dinical dimate. Rather, it must operate in the 
public forum where it is open to public sarutiny and answCTable 
for short-run as weU as long-run effects. It also tends to be a 
controversial issue because it concentrate so heavily on the 
low-income, "disadvantaged" groups, and many observers are 
convinced that they have been dissected, analyzed. an'< tf-iied 
enough. 

Because of the ethical considerations, the use of control groups 
In social rcseardi has touched some very sensitive n^ves in past 
sodal wdfare undertakfaigs. When guidelines were bdng 
developed for tlw evaluation of Elem«itary and Secondary 
Education Act Title I programs, the use of control groups was 
considered. The Office of Education dedded against them to 
avoid "some deser^dng Title I kinds Ibeingl denied services for the 

sake of experimentation. At the state levd the feeling was that 
Title I was not a research program and therefore control groups 
were not to be used."" When researchers tri«i to select a random 

Z&loiiB man, "Tichataa iiid Sodil DifficuWo la the Coaiuct oj EvaUat^ 
8««reh," telSiuf to fUMOKJu Fnwdi Obo, editof (New York: RuoeU 
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control group in rural youth employment and training programs, 
they encountered resistance because the process of random 
setection was not ac^ptable to conamunity leaders and school 
administrators." The solution chosen was to draw the membo^ of 
the control group from another jurisdiction that was not 
participating in the program. 

Social Expesimemtation: Panacea or Placebo? 

Some critics argue that, considering all the time, effort, and 
resources invest«i in soda! initiatives in the last IS yean, the 
payoff in ad^ knowledge and insights has been extremely 
limited. Causal relationships are as dif^cult as ever to document. 
There is still too much speculation and too little proof to 
distinguish between signifi^t or inconsequential variables. Hiere 
is a lack of coherence and consistency about the meanmg, to say 
nothing of the lessons, that can be garnered from the data 
collected in connection with the vast social program experience of 
the 1960s and 1970s. 

Systematic sodal experimentation is often held up as the 
Rosetta Stone that will make it posdble to decipher the knowledge 
buried in voluminous, disparate and incomprehensible data. Alice 
RivUn views systematic social experimentation as a policy tool in 
which "innovaticm should be tri(^ in enough places to establish its 
^piK^ty to make a difference and the conditions undo* which it 
works best.'*" Experimentation may also provide a mutually 
beneficial link between basic and "applied" (or evaluative) 
r^mch. Advocate see experimental programs as contributing to 
the knowledge that analysts need to make substantive contribu- 
tions to program evaluation and policy design."' 
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At first blush, systematic social experimentation appears to be 
an attractive solution to at least some evaluators, as well as to 
many soda! sdemists. Whoi it works, it can identify cause-effect 
relationships essential for sound pro^um evaluation and 
development. But experimental strategies are costly and demand 
elaborate design and meticulous attention to detail. It is not clear 
at all that the returns justify the effort. Two sets of factors limit 
the final usefulness of social experimoitation. First, the presumed 
experimental conditions that distingui^ this strategy from less 
rigorous aiH^foadies are almost impossible to achieve. Second, 
even where they supposedly can be achieved, the simplistic notioiis 
about cause and effect that underlie exp^imentation strategies fail 
to capture either the full effect of program intervention or to 
satisfi^orily document the inflivnce of non^cperimental vari- 
ables. 

The New Jersey Graduated Work Incentive experiment 
illustrates many of Uie pitfalls that exist in exp^imoital design. It 
was sponsored by the antipoverty agency, Office of Economic 
Opportunity, in 1967 to gain insights about the hnpacts of a 
guaranteed income program upon work l^havior. Expeimental 
and control grous» were assembled in four cities to test the 
reactions of partidpants to dif fo-cnt combinations of guaranteed 
income levels and marginal tax rat«. 

The experiment cost $8 million, of which about one-third went 
to actual cash payments for the participating families. The 
experiment itself ran four years and more than two additional 
y^rs were taken up in planning and later analysis. The final report 
was published in July 1974. Based on the evidence collected in the 
experiment, the authors concluded that there was no significant 
pattern of dfcrcasf^ work effort associated with the guaranteed 
annual income program.*' 

The overload of the New Jersey experiment is a useful starting 
point for an examination of the costs of systematic apnimi^ita- 
tion. Two-thirds of the total cost of the project were allocated for 
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study and evaluation. This included assembling the research staff, 
selecting the sample populations for the control and ejcp^imental 
groups, conducting interviews, terminating the ei^erim^t, and 
preparing the Hna! report. The expolment proved more expeosive 
to set up than anticipated because of the difficulty of getting an 
adequate racial balance in each experimental income-range 
sample. Thorough and controlled documentation required 
frequoit interviews and reporting. Control group |»rtidpants 
were interviewed quarto-ly and reported income monthly. 
Experimental group participants woe interview^i monthly and 
reported income bi-weekly. 

Nonetheless, obvious flaws could not be avoided. In fact, 
inherent in the ^tsy design, so carefully constructed, was an 
observation process that hiterfered with normal l^iavior. The 
exp^imenters did not analyze the ramiflcaUons of extensive 
obstipations, only noting the ejects of observation insofar as it 
made the experimental group more adept at filling out the income 
forms than the control group because die former completed than 
more frequently. 

Experimentation with the wrong variable is a potential flaw that 
can completely vitiate the experimental approach. The primary 
objective of the New Jersey experiment was to test how different 
minimum income levels and marginal tax rates affected the 
incentive of participants to work. Hie evident showed no 
significant pattern of persons dropping out of the labor force in 
response to high support levds or high marginal tax nu«. The 
experimenters concluded that a negative income tax plan would 
not cause people to deliberately cut back their income." Yet later 
evidence indicated that, while the effects of iacremes^ changes in 
support levels and marginal tax rates were not significant, thoie 
associated with the level of in-kind benefits appeared to be. These 
authors interpret the r^ults of the experimait as suggesting that 
while increases or decreases in marginal tax rates might not induce 
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pen^os to dumge thdr emsdoymeiit status, dianges is tlidr 
eligibility for n^licttl befits and housing aid probably would." 

The high costs of an experiment do not ensure conclusive 
results. Four years after the New Jersey experimoit was 
comi^eted, HEW and Labor unveiled the results of another 
guaranteed income experiment carried out at the other end of the 
continent. The Seattle, Washington and Denver, Colorado inrome 
maintenance experiments involving 4,800 families at a cost of $60 
million conducted over a three-year period, found that income 
guarantees did reduce incentive to work and dependency. The 
tnvestii^tors in this ^monstration also noted that the guaranteed 
income increased divorce rates, a variable ignored in the earlier 
project. 

The negative findings did not pmuade or discourage true 
believo^ though. One analyst suggested that the measured decline 
in marital bliss "is an ac^table price om must pay for greater 
equality between men and women. . . ."** Another observer 
declared, "There is more to marriage than economics, after all."" 
More fundamental, a closer examination of the data disclosed the 
need to disaggregate the sample ^udied. Av^ages can be 
misleading, sometimes hiding more than they reveal. Some 
participants reduced or stoi^ working, but they used their time 
to gain an education leaduig to b^ter jobs. OtlKrs worked fewer 
hours to take care of ti»ir chiklren." 

Underlying this redefinition of issues is the dilemma that 
constanUy faces evaluators and whidi assumes particular 
important in an experiment: Which variable should be tested? 
Evaluators often go after data on race when tiiey should be paying 
attention to sex, age, or education of program enroUecs. Or they 
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may try to link low income ami inadequate medical care to high 
infant mortality rates instead of more important esvironffi(mtal 
factors. Then, wfaoi the &idinp tmn out to be unsatisfactory, 
evaluators can hide behind the refuge that the variables examined 
were irrelevant or should not be controlling policy formulation. 

The Youth Incentive Entitlement Pilot Projects, mandated by 
the 1977 amendments to the Compreheasive Employment and 
Training Act, is another expolment whose high costs may fail to 
pay off in insights. This 30-month, $220 million search for cures to 
youth unemploymmt uses elaborate exp^imental designs to test 
the effect of a guaranteed job on school retoition. Added 
sophistication may not make the answers any less elusive though. 
So far, the projects seem as wrong-he^^ in their premise as they 
aie ambitious in their research objectives. Neither education nor 
employability and training experts have found much evidoice that 
a lack of income is the only factor, or even the most important of 
many factors, ^ntributing to the dropout problem. Even if it 
were, r^earchers are flnding that the notion of a guaranteed job 
introduces a host of complications that may prevent evaluators 
from toting their centrd hypoth^. 

Choosing the key variables is like shooting in the dark. 
Consequently, systematic experimentation may be Uiefflcient. On 
a pilot basis (the only economically feasible way to use systematic 
experimentation), the number of benefidaries is small compared 
to that in a full-scale program. If the experiment fails, losses are 
minimized. But if it succeeds, the time spent on the exf^riment (sb( 
years for the New Jersey experiment) is lost. And if national 
conditions change, the benefit of the successful experiment may 
never be realized. Harold Orlans describe the problem well: 
**Since the government's goals will and should change, as 
conditions dictate, the timhig of research is as critical as its 
technical adequacy. A quick study yielding gross estimates can be 
more useful than a laborious study producing more hiformation 
about a situation that no longer exists.*'^^ 
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Anotho' factor that dimimshes the value of social experimenta- 
tion is the incremental nature of gaining insights into social 
phenomena. Experience indicates that social knowledge does not 
make giant leaps forward. Even the best run social experiment is 
likely to show Uie way for making only marginal improvements. In 
light of this, the reluctant of program administrators to Jusdfy 
the expense of e}g)erimentation when less systematic approaches 
can yield ^ost the same insights and opportunity for 
improvement is un^standable." 

Perhaps the most serious drawback to sj^tematic social 
experimentation conc^ns die assumption that experimmtal 
conditions are attainable in a sodaX setting, l^ree un&?ntrollable 
outside factors influim^ the New Jersey experiment. The Hrst 
outside influent was anticipated: participant and community 
attitude towitrds the experiment. There was reluctance to 
cooperate with "do gooders" studying the poor. Program 
defers were also concern^ about partidpant reactions to the 
varying of benefits on a random basis, as well as community and 
control group acceptance of an experiment that excluded benefits 
altogether for the latter group. Inevitably, some militant 
community groups attacked the experiment, but their resistance 
collapsed when they be^me convinced that the choice was 
between half a loaf or none; while the experiment precluded giving 
money to the control group, the expoimental group would have 
received no support in the absence of the trial program. The 
obstacles were overcome in that expoiment, but the reastance to 
being guinea pigs is chronic and hard to overcome. It may be 
insurmountable wtwn the benefits of partidpatiou are not clear. 

A second serious disturbance in the New Jersey experiment was 
the January 1969 change in state law that qualified families headed 
by unemployed fath^s to receive AHSc. This change, also 
beyond the control of the evaluators, altered an important 
experimental precondition. One of the reasons New Jersey was 
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originally selected as a site for testing the negative in<x>me tax was 
the lack of any comp^ing welfare program for imemploy^ 
fathers (or plans for one). In order to minimis the competition 
and overlap between the two programs, the experimenters added 
another group of participants who received higher guaranteed 
support payments. This raised the costs of the experiment and 
n^ssitated payments in excess of the poverty guidelines." 

The politioU context of an experiment can destroy the illusion 
of a social iaboxatory. The New Jersey experiment encountered 
just this kind of problem with this external factor also. Although 
social experimentation is more akin to basic research than are 
oth^* program evaluation tedmiques, it is still apt to be more 
topical than social sciratists may wish wh^ experimoital niceties 
yield to political exigencies. Two observers of the New Jersey 
study noted that "it was inconceivable to ev&yone that political 
reality would overtake the experiment.*'*" Yet, in August 1969, 
when the project had been underway barely a year, the Nixon 
Administration unveiled its Family Assistance Plan. Seeking to 
reform the welfare system, it included a negative income tax, 
among other features. Since the New Jersey experiment was the 
most prominent ^urce of empiric^ data about the relationship 
between a negative income tax and work bdiavior, the project 
administrators were called to testify before the House Ways and 
Means Committee. At their first appearance, they responded to 
the congressional inquiries in gen^ terms. Thoeafter, they 
released their prdiminary findings that supported the Family 
Assistance Plan in principle. The report opened the experiment to 
close scrutiny by the Oeno-al Accounting Office and the Senate 
Finantx Committee. The latter attempted to obtain confidential 
records of participating famili^— a potentially disastrous action 
from the experimenters' point of view. The disclosure issue died, 
but the project administrators were hard put to counteract the 
OAO critidims. The experimenters learned a lesson and confined 
subsequent public comments to specific questions without taking 
sides in the debate.*' 
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After their encounter with the political perils in a researcher's 
Ufs. the project ofHdals noted that as experiments became more 
relevant to current political decisions, legislative inquiries wo-e 
more likely to pose sensitive issu^ and threaten the success of an 
experiment. It became evident that, to be effective, social research 
requires high standards of scholarship and political skills, that it 
cannot 1% merely a mental exercise for academics in ivory towers. 
It is a process of inquiry that must be able to withstand the 
scrutmy of a curious and sometimes hostile public, not to mention 
the scrutiny of other scholars. The work of a social scienti. 
demands sensiti%ity to the implications of the answers being 
sought and to the pressures that the policy arena exerts. 
Otherwise, as two observers r nnnr *-:d in regard to the New 
Jersey experiment, serious compL^ons may arise not merely 
from bad judgment, but even considerably more from plain bad 
luck." 

Bad luck, however, is not always solely a matter of random 
chance. While spedflc problems cannot always be predicted, 
general difficulties can usually be anticipated witii a great deal of 
confidence. Any situation mixing evaluation/research with 
program administration is bound to nm into difHculties, as local 
manpower administrators discovered in the 1960$ when they tried 
to impose an evaluation design on the new programs. They 
rediscovered similar conflicts late in the 1970s in an attempt to 
accommodate the needs of evaluators as»sssing youth employment 
and training programs." 

The Labor Department's Neighborhood Youth Corps program, 
adapted to the M«is of rural youth, offers additional insight into 
the pitfalls of the experimental approach. The Labor Ikpartment 
initially set up three experimental proj«ns corresponding to thro; 
basic rural economic in northern Minnesota, southern Iowa, and 
central Nebraska. The design was later expanded to include an 
experimental subgroup of American Indians in Minnesota. 
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Unlike the New Jersey experiment, the rural youth project was 
to have a short observation period in the hope of getting a quick, 
condusive v^dic^. As usual, events did not go exactly as planned. 
Sdection of the experimoital and control groups took longer than 
expected, and the addition of the Indian sample caused further 
delay. Funding delays held up the start of the project for IS 
months. A sununer program componmt was run on a m ak e shift 
basis in two states and omitt^ in the third. The school-year 
projects were delayed until Christmas. First year operations were 
beset by staffing and training problems, local suspidon, and a lack 
of cooperation. The lack of uniform compliance with federal 
guidelines required evaluators to d^ with three distinct projects, 
reducing the comparability of findings. The initially ambitious 
expolment was transformed into a brief, fragmented project. 
Needless to say, the results of what was supposed to be a fairly 
definitive experiment were, at be^, tentative. 

There are some lessons for the evaluator to learn. It is difficult 
to achieve controlled conditions that are adequate to justify the 
choice of the experimental mode. Because the contractor could not 
come up with an Ameri^ Indian control group, the usefulness of 
the final results for adapting the program to the unique needs of 
Indian youth was soiously limited. The lack of uniformity among 
ibe three projects made it difficult to gena'aliu about which 
approach to youth employment was effective in which kind of 
economy. A variety of community, political, and sodal forces 
disrupted well-considered plans ami reasonable timetables.** 

In principle, an experiment may be the most precise way to 
measure the effectivai»s of a «Kial program, but in practice it is 
difficult to implooent. It is time-consuming and can inhibit 
timely, innovative solutions. Ideally, eliminating hieffective 
solutions may save money. But, when inherent delays postpone 
the iffli^ementatkm of effective measures, tl^ result is social 
waste. The desire of dedsionmakers to make careful, wdl-docu- 
mented decisions based on fact is commendable, but considering 
the costs m time and lost opportunity, policymakers may have to 



44, Rcid lad MilM, iHt. 



4r 



34 Tods of the Trade 



settk insttad for intuMon, normative judgments, and experitrt«. 
Social problems arc too complex, their patterns too elusive, and 
tiw ae«i for action too pressing. Frequoitiy, as welfare reformers 
recon^rmed in 1978, pretensions of sd^tiHc rigidity and the 
advocacy of experimentation may be pretexts for avoiding action 
or substituting less costly programs. 

Masko Use of Evaluation FbauNOs 

The methodobgical challenge of evaluating social programs is 
only half the battle fadng practitioners. The other half is the 
struggle to get evaluation findings into policymaking channels. 
There are two obstades to be overcome: the mechanical process of 
managing disparate evaluation hidings and the pro^ss of 
generalizing some lessons from tht^ findings. 

Managing Ir\fomalion 

The history of inform^on management servi^ go(» back to 
the librarira of dassi^ times and the early encyclopedias. Modern 
systems were developed in medical sdoices, but only more 
recently have the social sden^ been given tiw 
treatment. The government's attempts to keep track of its social 
program evaluation intelligence have yet to be developed beyond 
the primitive stage. The In^tute of Sdatti0c Informalioa 
prepares the Social Sd&m Citation Index, It offers literature in 
practically aH social sdoxce research as wdl as sodal i^ogram 
evaluation. A more recent att^pt to keep track of what has been 
done \si the fidd of sodal program evaluation has also taken place 
outside the federal estabUshmoit: Databank of Program 
Evaluation (makiiig for the unfortunate acronym DOPE), 
e^al^ibed in 1972 at ^ UCLA School of Public Health to 
acctimulate and analyze evahiations of programs in the mental 
health and sodal action fklds. TStt service identilies programs that 
have been evaluated, provides a sununary of the program, 
describes how the evaluation was performed, and the hidings. 
The DOPE daU bank is assembled from material published in 
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joomals, contacts with sdected expert, and a search of already 
odftins data banks." 

Inside the govenunent« indexing ami referencing services whose 
functions are to ke^ tabs on sodal science research findhigs and, 
in jwticular, scdal program evaluation findings, have been 
slower to catch on. This has not, however, been due to a lack of 
attrition to the need for information management. 

HEW has hiitiated a number of systems to keep track of 
progress in the evaluation area. The Evaluation Documentation 
Cent^ follows mluations at all stages of progress. Its purpose is 
to prevent duplication hi the evaluation plannmg process and to 
snve as a reference service for outsidars trying to track down 
expats. HEW also operates Project Share, a technical assistance 
information retrieval and dinemination syston oriented toward 
serving the information needs of HEW program administrators in 
state and local govomments. It was estabU^ed primarily to keep 
track of information bearing on program operation and program 
management, and offers v^ little hitelligenee about the impact or 
effectiveness of HEW prc^ams. At HEW's bureau level, only the 
Office of Education has attempted an elaborate information 
management system: the Education Resources Information 
Clearinghouse (ERIC), a comprehensive data bank of evaluative 
research and otiier education literature. ERIC's most serious 
weakness is that it contains an overabundance of detail, which 
oftoi makes it too cumberson^ to be useful. 

The Department of Labor has a l^ser information management 
problem than HEW. The departmeit is smaller and narrower in 
scope, and the imik of its evaluation work has focused on 
employment and trahung efforts. There, tiie need for a formal 
system for keying ubt on the state of the research art and the 
infomuUion it yields has not been acute. 



4S. Ottaki M. WUsv. et «£, "Ditibuk of Prc«r«m Evihutkni." Evabiatkm, VoL I . 
No. 3. im.iadRobmW.KcaMri«|$oo,<fa/.. '^TIvNitureolPrafr^ 
Mnul Hcakb," Evtfmtknu VoL 2, No. 1. 1974. 
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Evaluation: For Investment or Consumption? 

The lack of a dear distinction between evaluation and r«earcfa 
is easily discernible. The federal government's Office of 
Management and Budget (OMB), marked by a iiredilection for 
neat compartmentalizaUon, has decreed that the distinction 
between the two activities Ues in the different roles they play in the 
decisionmaking process. Consequently, all agencies engaged in 
both evaluation and research must make administrative distinc- 
tions between the two, although they are hard put to spell out the 
differences. 

In line with the distinction with Uttle di^erence, 0MB adopts a 
fairly narrow concept of program evaluation as "a systematic 
process of management which seeks to analyze federal programs 
(or their components) to determine the manner in and extmt to 
which they have achieved (or are achieving) their objectives." 
0MB goieraUy \iew5 evaluation as confined to existing programs, 
although some elements of policy analysis, spedfically those 
concerned with the estimation of the impact of program options, 
fall under a broader definition.** Within this rather confining 
definition, the agency places great emphasis upon the "decision 
relevance" of evaluations and the desirability of making 
evaluations that contribute clearly to d^ions. In other words, 
evaluation Hndings should be applicable primarily to program- 
related dedsions about operations, funding levels, and policy 
directions. 

In the program agencies, though, evaluation is perceived as 
serving son»what broader objectives. Evaluation managers are 
happy to point out the relevance and effects of their Hndings on 
policy choices and other program decisions. But they also murmur 
under their collo^ve breaths that the search for a one-to-one 
correspondence between evaluation findings and d^ions based 
upon these findings is not an appropriate indicator of how 
effective evaluations arc, and that any attempt to force evaluation 

46. U.S. Office of M«»|<ni«« tnd Budjrt, "Probkou ia Evalu«toa Deiiin: A 
BackfTOund Paper," October 1976. pp. 2-3, asd "Evghuuk>n Mtnapnu»t: A Back^ouod 
Paptr," May 1975, 
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management into a dressed-up "manasemeat infonnation 
system'* would be a mistake. In contrast to the OMB view of 
evaluation as a pro^s tailored to serve specific needs, the 
managm of prognun evaluations in the executive branch see it 
more as a learning exercise. They hope that through an inductive 
process of succ^ve reviews, larger pattons wiU emerge; that 
evaluation findings will form a patchwork of new lessons to be 
knitted together so as to contribute to larger, generalized notions 
of social theory.*^ As part of that larger '*body of knowledge," 
evaluative findinp build on the existing evidence to advance soc^ 
i^»ory. 

However, the claim that evaluation and r^earch activities are 
contributing cumulatively to the aggregation of knowledge 
remains an assertion. Given only skimpy evident that evaluation 
produces immediate hnpacts, evaluators are for^ to lay claim to 
long-run achievements. It might be more accurate to say that the 
imme d iate contributions of evaluations ve marginal and their 
long-run cimiulative impact remains to be determined. 

BtJILDINO A UiBAOU BaSB 

A National Research Council report has oitidzed the research 
and development program of the Department of Labor's 
Employment and Training Administration for not establishing a 
base of knowledge upon which future research can build. 
Recognizing that knowledge building in a relatively new field is 
bound to be uneven, it noted that "expanding and cumulative 
effects cannot be obtaii»d unless successive analyses of a problem 
build consciously on earlier results."** To Allen Schick, the 
problem represented a failure to integrate evaluation fmdings and 
policymaking. "There often is little follow-up to an evaluation; 
once done, the case is closed and the evaluators move on to other 
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matters. EsK:h <n^duation is redded as a discrete special evoit 
rather than as part of an overall policy process.*''* 

In both instances, the underlying assumption is that evaluators, 
Uke brickmasons, can lay a foundation and then build on it. 
However, there is little evidence for such a claim. Merely drawing 
parallels to experiences in the physical or life sciences does not 
provide convincing proof that evaluation and res^ch can be 
developed into coherent parts of a systonatized body of 
knowledge for use in social policy. The "softness" and 
unpredictability of social scien<% phenomena contribute to this 
difficulty by making learning, if it can be achieved at all, a slow, 
tedious process. TIk mistakoi assumption is that if a "critical 
mass** of knowledge is gath^ed, progress will necessarily follow. 
An airplane builder gets off the ground only after establishing a 
base of und^standhig in mechanical oigineering, aerodynamics, 
basic physics, and materials properties. Knowledge does not start 
accumuiating imtil there is a base of theory and evidmce in all the 
component areas and the prototype airplane has successfully 
gotten off the ground. The knowledge building goes on from the 
original base and from what is learned in that first uncertain 
flight. 

. Research hi the life sci«ic^ shares the same problems and hopes 
hi pursuit of the accumulation of knowledge. In the early 1960s, 
the Office of Science and Technology established a study group to 
review research management policies and to recommend step? for 
hnprovuig the return on research investments. One of the group's 
fmdUigs was that a body of knowledge could not be started and 
built in disaete blocks of knowledge or in discrete Helds of study. 
Because of a high degree of hiterdependence among ar^ of 
research, the group not^, more mesmingful inclusions rested 
upon a synthesis of knowl«ige from more than one area. The 
group concluded that before rsearch could have any khid of 
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escpanding and cumulative effect, there had to be an undifferen- 
tiated mass of knowledge, a relatively unstructured body of 
expc»«ice. From this base, it was asserted, a paradigm could be 
fashion*^ and leaning could go on, drawing from many sources 
as needed and building upon what had ah-eady been ^scovered." 

Whatever merits this learning process may have for the physical 
and biological sdmces, the approach is not promising for the 
evaluation of social programs. The difficulty is that similar 
conditions cannot be repUcatcd in the evaluation of social 
programs. Mechanical engineering principles and basic physics are 
equally as applicable today as in the days of Faraday. Newton and 
Einstdn both lived with the same law of gravity. Sodal institutions 
change, however, and evaluations of yesteryear remain only of 
historical interest to today's evaluators. 

However, that does not mean that evaluation and research 
activiti» can go off undirected or with only passing attention to 
how findings can be incorporated into a generalized body of 
knowledge. Still, there is little reason to hope that there is much 
gold to be mined in old findings. Agency evaluation and research 
managers readily admit that they pay inadequate attention to the 
manner in which new efforts add to what is already known. The 
reason for this neglect may be the fact that the payoff is negligible. 



iNtrmmoNAi, Akbangeiismts 

Tbs changing evaluation literature has focused either on 
substance of findings or methodological issues. But with only a 
few exceptions, there has been little attention given to the 
pra^tionen of the trade, and virttially no consid^ation given to 
the infiuencM that institutions wield in detmnining the purposes 
of evaluations or the uses to which they are put. 

The fact that evaluators have tended to pay little attention to the 
argumrats of their trade is not luc^sarily due to excessive 
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modesty. It is more reasonable to assume that the neglect is due to 
methodological factors. Institutioaai influences on evaluation are 
hard to describe and impossibk to capture in quantitative terms. 
Selecting the variables to study, collecting data on them, and 
interpreting the data all involve an enormous amount of 
subjectivity. Under those conditions, even the illusion of 
objectivity is hard to achieve. Any attempt to tackle the 
institutional issues invites attacks (as the present authors can 
testify) on methodological grounds and almost certainly will be 
disputed as a misinterpretation of findings— a safe accusation 
since any interpretation is arbitrary. These conditions reduce 
substantially the incentives for examining the institutional issues. 

But the more practical aspects accounting for the neglect of the 
institutional aspects relating to evaluation also cannot be ignored. 
Evaluators are no more prone to bite the hands that feed them 
fhnn any other group of persons concerned about their own well 
being. Since governmental agencies account for most of the 
support for evaluation, there is no pecuniary return to most of the 
practitioners for doing critical evaluations of their supporters. 
Existing discussions of evaluation supporters lend to be, 
therefore, public relations jobs, althou^ exceptions occur. It 
takes a courageous administrator of evaluation programs to 
subject an agency to critical evaluation. An exception that can be 
found in the literature is an evaluation of the Office of Research 
and Development in the Department of Labor." 

Institutional factors, whether or not they are recognized, have a 
pervasive influence on the federal government's social program 
evaluation policies. The purpose that evaluation serves, the way it 
is done, and the manner in which evaluation findings are 
incorporated into policy are directly affected by these forces. 

A useful starting point for an analysis of institutional factors 
associated with evaluation is the constitutional roles that 
differentiate the legislative and executive branches. In the most 
formal terms, the legislative branch establishes policies and the 
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executive branch bears responsibility for implementing them. This 
constitutional basis does not force a clear differentiation between 
evaluations in the two branches. Indeed, in a case-by-case 
analysis, there are probably more similarities than there are 
difforences in the way evaluation is undertaken and tiie findings 
used. 

The constitutionally differentiated roles, however, do establish 
an underlying mode for checks and balance. There is an 
inevitable tension between the two branches, and evaluation is 
used as a weapon in the adversary relationship. 
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E^^huuioQ takes on a number of fonns in the l^islative branch. 
Congrescionai committees do it all the time. Their work ranges 
from gumshoe ass e s sme nts Uden with anecdotal information to 
more careful and systematic reviews. The General Accounting 
Office does painstaking reviews of programs while the 
Congressional Research Ser^^ recycles evaluation literature 
prepared by the executive branch and others to produce a syntheas 
of current views. The binding for^ among all the legi^tive 
branch's evahiation worie is its constitutional and insiitutiOQal 
over^t role. Hie legislative brand! has the duty to keep tabs on 
bow the executive brandh is carrying out congressional mandates. 
But although it ^ves legitimacy to the congressional evaluation 
rote, the constitutional basis does not necessarily guarantee the 
effectiveneu of these efforts. The purpose of this section is to 
examine how effectively Congr^ discharges those evaluation 
reHK>nsib!liti«» 

CowamnoKAi Ovwutaax 

Congressionai oversight is nebulous but all encompasdng. One 
study of ovei^t activity Hsts no few^ than nine distinct 
definitions. They range from tl» narrow "revkw after the fact'* to 
broad reviews of "ahnost everything members {of Congr«s] do, 
e.g., legislati ng, gathering information, campaigning, etc.'*' 

1. Karriioo W. fen, Jr.. **Omiit»: 1$ Coosrw Doins Itt Job?" priMUcd «t 
Uw 1974 AaaMl MMiiiV of tte Amvfeu Poiiticti Sd«6t Amo^^ 
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Regardless of the definition chosen, program evaluation is clearly 
a legitimate and important part of «5ngrcssional oversight. In 
19SU a member of Onigress put it well, saying "I know of no 
means whereby Congress can assert its authority over national 
policies except through the cxpanaon and improvement of 
inv<stigativc ^wers."» His sentiments were ^ocd a g«ieration 
l&tss by another congressman in a world that was, as far as federal 
social programs were concerned, light years removed from the 
^pler days of post-World War 11 Ammca: "The oversight or 
evaluation function will allow Congress to begin to restore the 
balance of power between the executive branch and the legislative 
branch."' 

Clearly, oversight is an int^al part of congressional activity. 
And the more legislation Congress passes, the more activities there 
are to oversee and the more rwults there are to assess. A report on 
the operations of the senate cited three reasons for improving the 
capabiUty of the Senate (and Congress) to stay on top of issues: 
the expansion of the powers of the presidency that was occurring 
at the expense of the legislative branch; the increasing complexity 
of le^lative issues; and the explosive growth of knowledge and 
information.* That report, completed at the «id of 1976. is one in 
a long list of commentaries coming from inside and outddc the 
Congress calling for more and better congressional oversight— and 
evaluation. Whatever response emerges wiU be another addition to 
a growing list of measures taken to strengthen the oversight and 
evaluation capabilities of the legislative branch during the last 
several decades. The Legislative Reorganization Act of 1946 made 
specific reference to congressional oversight mandating standing 
committees to . . eiercisc continuous watchfulness of the 

execution by the administrative agencies "* The Legislative 

RcMganization Act of 1970 added specific provisions for 

2. Ckofie Mcad^, "CooiMHkaaa lB««ii«tioiu: ImportMce of the Fict-FiwUni 
fnxm," UiUvmity ttfOiksto Uw iUfitw, ^Mfinf 1951, p. 450. 

3. J«mo J. Bl«och»rd. ZtrthBam Btfdgtt Ugislatk>H, In U.S. Cooiwm. Home 
CooHato«ooU»BudfK{W»ihia|tawOowniaHttPrlntii«Of^ 1976). pp. 2*3. 

4. Ttmofd a ModimSttrnt. VS. CcmsnM,fia»l9xtMn of ^Comahd^ 
OpcnBioo of tl» Senate (Wariiiaitoo: Oovwnment Prlntini Office, 1976), w?. 42-43. 

5. Pobik: Lew 79^1. Sectkw 136. 
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obtaining agency budget and performance data.* The Con- 
gressional Budget and Impoundment Control Act of 1974 
extended the authority of congressional committees to carry out 
oversight and evaluation activities and laid new mandates on the 
congressional support agencies.^ 

The much debated "sunset" provisions, calling for the 
automatic termination of programs unless congressional oversight 
demonstrates that they are attaining their goals, offer another 
approach. Their justification was summed up by a congressional 
proponent: **. . . it is clear that we need a procedure that will 
require careful scrutiny of every spending program to determine 
whether it is operating effectively or needs modification or 
elimination."' 

However much Congress does about evaluation and oversight is 
surely not done in ignorance of the need for oversight. Evaluation 
and oversight have been given a great deal of lip service and are, in 
principle, virtues of universal appeal. But in a body given to as 
much rhetoric as the U.S. Congress, all pronouncements must be 
taken with the proverbial grain of salt. 

Congress obviously plays a leading role in shaping social policy. 
That has been the case in the last decade and a half especially, and 
it promises to be the case for the future, whether in an era of social 
program expansion or retrenchment. Granted, decisionmaking 
and policymaking will go on with or without oversight and 
evaluation procedures. But the quality of that policy formulation 
will hinge, to no small extent, upon the quality of oversight and 
evaluation activities supporting it. In short, the technical state of 
the art and the institutional arrangements for evaluation are 
crucial to the course of social policy. 

There are identifiable kinds of oversight that correspond to each 
of four congressional decisionmaking mechanisms: legislative. 



6. PubUc Uw 91-SlO, Title* H tad lU. 
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fiscal and analytical, investi^tive, and authoritative. Each type of 
congressional-based oversight can be incorporated into the 
corresponding decisionmaking structure. Legislative ov^^ight 
includes hearings, meetings, and reporting requirements. Through 
fiscal and analytical ovffsight— the power of the purse— the 
appropriations committees, joined more reccnay by the budget 
committees, determine the level of funding for given activities. 
Investi^tive ov«^ght is carried out to control and discipline 
particular executive operations. Authoritative oversight operates 
through the periodic review and amendment of authorizing 
legislation and the confirmation of presidential appointees. 
Impeachment is perhaps the ultimate oversight.' 

Evaluation as Part of Oversight 

Program evaluation is particularly useful for legislative and 
fiscal oversight, shedding light on the cffcctivcricss of alternative 
strategic or the effects of different fimding levels on service 
delivery. But the quality and usefuln«s of evaluations arc 
influenced not only by their methodology, but also by who is 
doing them. 

As large as the legislative branch is— 535 members of Congress 
and over 18,000 staff in 197S— It does not function as a single 
hierarchy as do the many bureaucrad^ in tts executive branch. 
Instead there are 535 different d^isionmakers, each, theoretically, 
with the same power and each with an independent power base 
and distinctive constituencies. In practice, of course, some 
members are more equal than others and the differences are 
embodied in the congressional pecking order. But the process of 
dedsionmakiog is less clear than in a hierarchical structure, 
information needs are much more fragmented and diverse, and 
political partisanship is a powerful motivating force. 

Because of the large volume and complexity of the isst cs that 
come up for congressional consideration, there is an inescapable 
need for specialization and a division of labor. Most of the basic 
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substantive work on legislation is done in committees and 
subcommittees. And although any member of Congress can speak 
before the full House or Senate, there is no guarantee that anyone 
will listen. The principal forum for congressional and public 
influence that is brought to bear on an issue is the committee or 
subcommittee. That is where specialization takes root and where 
oitical legislative powers lodge. 

Evaluation by Congressional Committees 

Formal congressional evaluations of social programs are 
customarily done by one of the support agendes, but the work of 
the committees should not be glossed over. Like Moliere's hero 
who spoke prose all his life and did not know it, congressional 
committer are rantinuously engaged in evaluation either as part 
of the regular authorization and appropriation process or as a 
distinct monitoring exercise. They lay out the important legislative 
issues and identify the crucial qu^dons, providing specif 
guidance and instruction to the support agenda eml»rki&g on 
particular evaluation projects. The committer, in effect, act as 
the hub for congressional oversight and evaluation work. 

Evaluation has long been recognized as a legitimate and 
significant committee function. Howev^, committees rarely 
indulge in detailed reviews. Usually, they tend to rely on the 
support agenda to grind out program assessments, but there are 
notable exceptions. 

The Joint Economic Committee's review of alternative income 
mahitenance strate^^, in which basic research was combhied with 
evaluation, is an illustration of committee ova-sight at its best. 
The study was initiated after President Nixon introduced his 
Family Assistance Plan (FAP) in 1969. When the House Ways and 
Meauu Committee deUberations on FAP in 1970 and 1971 
produ^ more questions than answers, committee member 
Congresswoman Martha Griffiths became convinced that federal 
income support activities deserved thorough scrutiny. She used her 
position as chair of the Joint Economic tlomndttee's Fiscal Policy 
Subcommittee to get tl» job done. The Hnal product, Studies in 
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PiMc Wetfan, was three and a half years in the makios. In 
additioB to the 24 poblis!^ staff ^udks, the output included four 
volumes of hfin1pg*i a sunnnary report with recommcndjuions 
approved by the subcommittee, and a draft bill advocating reform 
of iHiblic assistance.'* 

Siudks in PidtUc We^are was an important addition to the 
burgeoning sodai welfare litaature. Although the several 
monographs di^slay a dear bias in favor of a guaranteed hicome 
scheme and strong oi^sition to in-kind aid programs, the total 
effort is a well-documented review of income support programs 
and a th(^ough analysis of the impcmant issues in welfare policy. 
It spurred a more sophisticated level of debate and has had 
conuder^sle mfli»nce in other respects. Its ulthnate effect is 
uncm^t however. By spelling out in gruesome detail the pitfalls 
of alternative income maintenance strat^ies, the studies may have 
bad tlu effect of squeldiing any initiative. With the options and 
correspondhig a»ts laid out, differ«it sides ccnild see without 
further debate the prestos they were up apinst. While kiherent 
in any solution, the discovery of these problems was instrumental 
in killing oithu^asm for a major overhaul of the welfare system at 
that tune. That net effect may have bees for the good, although in 
a perverse way, considering the intent of Congresswoman 
Griffiths and her staff director, Alah* Townseid, to champion a 
guaranteed \asome support program. Part of the stimulus for 
Studks in Public HWart was the projection for ccmtinued 
exponential growth in the AFDC casdoad as occurred hi the late 
1960s; an erroneous assumption that failed to consider the fact 
that the system had already absorbed most of the potentially 
digible population, and which ultimatdy tainted the enthe study. 
In actuality any reform le^slation based on the SEC Subcom- 
mittee't conclusions would have placed much of its focus on an 
iUusory problem. 
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The Souue BaakiBg, Hcrasins and Urban Affairs Ccmimittee 
evaluation of existing and proposed hoiudng legislation of 1973 
serves as another iUustration of detailed ccmgressional over^t. 
It combined a dose of poU^ with proposals for new legislation. 
The evaluation consisted of several elemoits. First, the committee 
held oversight heaiinp following the President's imposition of a 
moratorium on new sui»idized housing co m mi t m e nt s undo: 
existing legislation. That was fdlow^ by hearings on a broad 
range of legislation introduced to improve houdng in urban 
development programs, including an assessment of the housing 
and ^immunity development le^lation propc»ed by the 
President. The final action was an analysis of the administration's 
position. The output of the evaluation effort was more than 3,000 
pages of testimony, a rQMsrt, and a bill that was eventually 
in(»)rporated into tl^ Housing and Community Devdopment Act 
of 1974. 

In both cases, the congres^onal subcommittee relied heavily 
upon outside l»lp. Much of the background work for Studies in 
Public Welfart was done by executive agrades, the Congressional 
Research Service, the General Ac^unting Of Hce and nongovem- 
mmtal experts working either without compoisation or without 
contract.*' The review of housing programs, although a more 
modest project ^vering a shorter poiod of time, rdied heavily on 
support provided by the Congressional Research Service." 
Although both committees were in central positions to direct the 
evaluation work, i^ther could have done its evaluations relying 
on committee staff alone. 

These two cases are the exception, however, not the rule. 
Congressional committees are not geared to conduct sustained 
evaluations, and the endorsemoit of evaluation in the Congres- 
sional Budget and Impoundm«Jt Control Act of 1974 is not lUcdy 
to alter the situation. Since enactment, few consultants or 
^ntractors have been picked up by ^mmittees to do evaluation. 
The inclination appears to be to let the Congresaonal Research 
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Scivicc the General Accmintins Office, and executive agencies do 
tbeeviduations and then use the products as the basis for further 
committee detiberttions and action. 

Limiis on Committee Evduation 

The dearth of direct evahiation by congressional committees 
does not indicate a lack of interest on the part of Congress but, 
rather, reflects prior^a, practicalities, and political realism. 
Because support agencks can do evaluation f(ff them, committer 
place a low priority on direct monitoring or evaluation. It is a 
matter of using limited resources who-e they will ^ the most 
good. A second, related reason is the constraim imposed by 
limited staff expertise. Given the broad d(»nain of some 
committees, their small staffs, tl» hiform^ oUigations of staff to 
the senior members, and the marginal interests of some in 
committee work, it is imp(»sible to obtain high quality coverage in 
all areas. Moreover, the propensity of outside experts to offer 
advice is w Jl known. Most expets are avaikble on caU and are 
only too eager to appear before congressional committees to share 
their wisdom without compensation. Thus, the premium staff skill 
is not the ability to evaluate progiams and to offer solutions, but 
to raise the right que^ons. 

The committees with jurisdiction social programs have 
rarely utilized eonsuhants for evaluation purposes. They prefer to 
let the Congressional Resevch Service, the Ooieral Accounting 
Office, or the Congressional Biuiget Office do the ^ntracting, 
since committees are subject to ineviuble politksal pressures to 
hire persons on the basis of who wHtm than v/hat tl»y know. 

The experience of the House Agriculture Committee's effort to 
evahiate pending food stamp ^islation taught that lesson weQ. In 
1974, the MS. Dqwrtnifiit of Agriculture cc»tt»aed with 
Mathematical a reputable research firm spedatidng in sodal 
legidatkm, to test the impact of changes in food Mamp eUgiHlity 
cuto^ and benefit levels on program dients. The first results of 
this analysis became available in late 1974. In mid-i97S, shcfftly 
after the adndnislration proposed food irtamp reform i^^slation. 
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the consulting firm examined the probable effects of the proposed 
legislation and found them to be less positive than promised by the 
sponsors of the legislation. The House Agriculture Committee 
then attempted to contract with the same firm to test for variables 
not considered by the Dqjartment of Agriculture. Political 
pressure doomed the project when the contractor was caught in a 
crossfire. The contractor's connection with the administration 
position made the organization suspect to some Democratic 
members, while conservatives were suspicious of the firm for an 
alleged liberal bias. The Agriculture Committee terminated the 
contract with Mathematica and relied instead on assistance from 
the Congressional Budget Office to assess the administration 
proposals and alternative approaches. 

There are other compelling reasons why committees do not do 
their own evaluations. Partisan pressures are inevitable. An 
in-depth congressional study concluded: "At some early point 
advocacy tends to take over from objective inquiry. Delineation of 
the problos, acquisition of related knowledge, discovery of 
interested parties, committee markup, ana floor consideration 
follow one upon another."" 

The seniority system and procedures for assigning members to 
committees are also inhibiting factors. The success of Studies in 
Public Welfare can be largely attributed to the interest and 
substantive knowledge that Congresswoman Griffiths brought to 
bear on the staffs work. But her being in the right place at the 
right time was a product of chance, not planning. Seniority does 
not necessarily go hand-in-hand with substantive expertise. 
Congress lacks the institutional mechanism necessary to 
systematically tap that kind of interest, ability, and expertise 
necessary for program evaluation. Studies in Public Welfare was 
the product of a rare juxtaposition of executive initiative, 
congressional interest, and fortuitous committee assignments. The 
study probably would never have been undertaken if President 
Nixon had not proposed his Family Assistance Plan. Nor would 
the product have been as significant if the subcommittee had not 
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secured an outstandiiig staff diivctor with exceUent connections in 
the wdfare reseurcfa community that stood ready to assist 
Congresswoman Griffiths in the exercise. 

Congressional committees may find sodai pro-am evaluation 
risky; it is frequently considered a no-win activity. It involves basic 
questions about the limits of individual responsibility and the 
obli^tion of society to care for those who cannot or do not take 
care of themselves. Consensus is hard to come by, emotions often 
get the upper hand, and subje^ve judgments must be pccQit^ in 
formulating methodcdogies. The dioice of methodologies and 
simplifying assumptions, as well as variable and hypotheses to be 
tested, are all open to debate. The kind of bickmng the House 
Agriculture Committee encount»«l in its baiting stqjs towards 
evaluating chaises In food stamp l^islation is more typical of 
congressional committee experience with social program evalua- 
tion than the exp^ience of the Joint Economic Committee. 
Furthermore, the latter had no legislative authority, which might 
mean that the other members left the conduct of the studies and 
the drawiag of the conclusions to the chair, aware that it would 
not directly affect legislation. In short, prop^ evaluation is best 
left to nonpartisan congressional agencies that lack the authority 
to compete with legisladve committees in sponsoring legislation. 

The reluctance of congres^onal committees to get involved in 
- valuations is understandable. But assigning the responsibility to 
the support agencies has inherent shortcomings. The most erudal 
advantage of direct committee control is tl» assuramx that the 
evaluation will be relevant to policy choices being made by 
Congress. In discussing the need for conuol in directing research 
efforu. the staff director responsible for producing Studies in 
Public Wtlfan dted the value of a strong committee staff 
**. . . not to control the findings, but to retain the desired 
focus. If evaluations are not kept on the targctt Congress sets, 
and if they do not produce findinp that can be i^s^ied during 
congressional decisionmaking, their net contribution may be 
marginal at best. 
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TBb Congksssional Btnx3BT Omcs 

The Congressional Budget Office (CBO) Is a relatively recent 
addition to the legiatitive scens. It was ^tablished under the 
provisions of the Cospessional Budg^ aad Impoundment 
Control Act of 1974 to help the House and Senate Budget 
Committees prepare the concessional budget and to provide 
technical assistance to oth^ committees on budget matters in 
particular and on economic matters in genial." The idea was to 
create a congressional, nonpartisan think tank, supplemented by 
l^dative committer within the traditional structure, which 
would be cai^ble of providing nonpartisan analysis of the issues 
having an impact on tl» federal budget. 

The initial CBO dir«:tor, in d^cribing the office's role, 
emphasized the constraints bearing down on its operations: 

CBO won*t have the kind of depth to do program 
evaluation, where we actually get in and collect original 
information on programs. We will try to help as much as 
we can, but can't take over and we don't want to take 
over the functions of the General Accounting Office, 
which does not only the auditing of programs, but deals 
with evaluative work.'* 

So far, the Congressional Budget Office has relied in its 
evahiatlon work exclusively on data akeady available from 
executive agenda, the General Accounting OHxce, or private 
sources. The bulk of its work has centered on broad-based policy 
analyses and compaiisons of the relative effei^veness of 
alternative social strategies. For example, in studying youth 
unemployment problems, CBO cteHncated the nature and extern of 
teenage unemployment and laid out some options for dealing with 
the s»'ob!ems.'^ But the paper did not address the specific 
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fNuibiiity of the options or break any new ground in evaluating 
the effectiveness of past strategies. Another study compared the 
relative impact of job (^ei^on and training strategies. Again 
relying on evaluations of specific programs, the CBO laid out the 
policy options and the budget implications of each m^jor effort." 
^ In both, CBO inte^ted information obtained from past 
evaluations and other data to present policy alternatives. 

Some monbers of Congress have criticized CEO's restraint in 
program evaluation. One leading advocate of budget reform and 
the new budget process claimed that his support of a relatively 
large staff for the Congressional Budget Office was based on the 
anticipation that CBO would carry on extendve program 
evaluation activities. His complaint was common, sa^ng that 
Congress had "neve- been staffed up adequatdy to really 
challenge the testimony of executive department witnesses. We 
have no way of going behind the scenes and seeing whether they're 
using the funds we've given them wisely and whether their requests 
for additional funds are justiHed.'"* 

There are some obvious differences of opinion over just what 
the appropriate role of the Congressional Budget Office should be 
in evaluating federal activities. At the same time, there has been 
some confusion about the role of the other congressional support 
agencies. Both program evaluation and certain aspects of policy 
analysis are conduct bv the General Accounting Office and the 
Congr^onal Research Service. Against that backdrop, the 
attempt to get tise Congressional Budget Office involved in these 
activities might suggat that th^ agencies are fallmg down in their 
r^ponsibilities, and tiiat CBO should fill tbt void. Given the 
miwness of the ofHce and the expanding capability of the 
Congressional Research Service and the General A<^unting 
Office, it would be folly to speculate about the precise turf tisat 
CBO will stake out for itself. Clearly, there is a strong demand for 
independent, substantive, congressional evaluations of federal 
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programs with a minimiim of reliance on executive help, and CBO 
as weU as the two bud^ coxmnittees is filayins a part in meeting 
that demand. 

Tbm OsHSMML AccotssrnHQ Orncs 

A hanging Mission 

Tbs Budget and Acccmnting Act of 1921 established the General 
Accounting Office and the Bureau of tl» Bud^. The Act carved 
the GAO rede out of the Offiee of the Comptrolla of the Treasury 
whose responsibility was to audit all executive branch c^ierations, 
making the GAO an independent agency reporting to the 
Congress. The GAO was ikvm the narrow r^x)nsibility of 
aiuiiting as weU as authorization to review program administration 
and i^ieration, while the Bureau of Uw Budget was given much 
broader scope to handle budget and Hnandal managemoit affairs 
for the executive branch. 

Until 1946, virtually all of GAO's activities involve financial 
auditing, with no formal oversight responsibilities and no 
sul^tantive program assessment. Althougji the original GAO 
charter was sufficiently vague to authorize a wide range of 
activities, all of its work focused on keeping govemment offldals 
honest by reviewing vouchers and assuring that money was spent 
properly. In fact, much of the value attribmed to OAO's rote was 
due not to its audit disclosures, but to the threat that the financial 
transactions of agencies would be scrutinized." 

The Legisktwe Reorganization Act of 1946 and the Govem- 
ment Corporation Control Act of 1945 expanded the narrow GAO 
view of what constituted good government. The two acts reflected 
the growing feeUag that good government requi^ more than 
keqjing dvil servants' hands out of the till. The reorganization aa 
delineated an oversight rote for Congress and broadened GAO's 
mission to supped congressktnal oversight. Besides reviewing 
c^ienditnrcs to check whether federal funds were ^peat honestly, 
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the law charged OAO with the responsibility to monitor 
ai^ropriations to see whether they^ were being administered 
effidenily. The 1945 act broadened GAO's mandate, calling for 
more than diecks on Hscal ac^untability. It set in motion a 
progression toward a broader definition of accountability to 
include managemoit and program accoimtability. 

But institutions change slowly. The broader rraponsibilitics 
as«gned by Congress notwithstaading, GAO continued for the 
next two decades to confine its work largely to financial and 
management audits. Tl» surge of Great Society programs during 
the 1960s generated the need to put GAO imo tte role of social 
program cvaluator. These controversial and multi-faceted 
programs demanded more substantive evaluation and, in some 
cases, established an explicit requirement for it. If a single event 
indicates the expansion of OAO evaluation respondbilitles, the 
1967 amendments to the Economic Opportunity Act requiring 
GAO to evaluate the antipoverty i^grams might be the proper 
mark. Eitlwr because the agency lack«i confidence in its own 
capability to undertake this pioneer investigation, or because it 
sought to broad«i the scope of its investigation, the GAO relied 
on contractors for guidan<» and assistance. But the final product, 
the Revkw of Economic Opportunity Programs, presaitcd to 
Congress in 1969, was the first comprehensive evaluation of a 
sodal program done by GAO. 

Reflecting this expansion in responsibilities, the GAO staff 
capability also changed over the years. Before 1946, the bulk of 
the agency's activities consisted of routine work determhiing 
whether or not expenditures were allowable and matching 
expenditures to supporting documentation. The work was done 
mostly by a technical staff of bookkeepers under the supervision 
of accountants and atioracyi. 

In 1946, when GAO's explicit maiulate changed. Congress made 
new demands on the staff . Beridw checking whether cxpen<«turcs 

were authorized and documented, OAO had to audit accounts and 
programs to assure that government funds were used economically 
and efficiently. To carry out its rMponsibUitics, GAO had to 
bolster its staff capabiUties for comprehcn^ve audits by 
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employing more accountasts. By the mid>i960s, (before the shift 
to mcffe sub^antive program evaluation) auditors had come to 
comi»ise, along with 8»»untants, roughly 90 percent of the 
professional staff. 

The laws charging GAO with the responsibility to evaluate 
federal programs also drastically a0^ected the bureaucratic 
structure of the agency during the 1960s. When GAO was set up in 
1921, ^ operational divisions were transferred from the Treasury 
Departmoit, eadi corresponding to an executive branch agency: 
the Treasury, the War Departments, the Navy, Interior, the Post 
Office, and the remaining departments. As the executive branch 
grew, OAO's primary audit work continued to be done along 
og^icy lines. 

Administrative Structure 

Compared to its first 45 years, the following decade of OAO's 
history was turbulent. Congress made major changes in OAO's 
statutory authority which had a far-reaching effect on the agency's 
organization, staff, and scope of activity. The changes in the GAO 
evaluation role have been especially sweephig. The forc» behind 
them are plentiful and complex. Some may have been an inevitable 
function of natural growth and a changing environment, while 
others have been the product of more delibo'ate legislative action. 

A review of GAO's transformation from an auditing agency to 
its broader current role might best start with the evolving 
congressional mandate. Comptroller General Elmer Staats, who 
took over direction of the GAO in 1966, was more than 
cooperative in transforming the role of the agency; but the two 
laws instrumental in pushing GAO into an expanded role were the 
Le^slative Reo^anization Act of 1970 and the Congr^onal 
Budget and Impoundment Control Act of 1974. 

Under the 1970 act, GAO's responsibilities were extoided 
considerably beyond the traditional financial and management 
auditing. GAO was required to standardize budget and fiscal data 
in coordination with the newly-designated Office of Management 
and Budget (the former Bureau of the Budget) and the 
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Department of the Treasury. The Act also required GAO to 
evaluate results of programs and to develop the capacity for 
preparing co$t>benent studies. It und erlin e d the hicreased 
importance Congress attached to an aggressive evaluation role for 
GAO and removed any donent of choice that the agency nuy 
have had in the matter. This i»w direction mark^ a distinct shift 
away from a preoccupation with auditing and towards a greater 
emphasis on evaluathig programs and the effect produced by 
govemmoit spending. 

The Congressional Budget and bnpoundmoit Control Act of 
1974 reinforced and articulated furth^ the evaluation role of 
GAO by providing for an easier flow of budget and fiscal data as 
well as specific program information. More spedHcally, it 
authorized GAO to set up an ofHce for program evaluation and 
review to help OAO assume a leadership role in legislative branch 
evaluation by requiring the agency, through the ofj5ce, to: as^t 
congressional committees, the Congressional Research Service, 
and other GAO divisions doing evaluation, and to recommend 
geno'al strate^es and tactics for carrying out program 
evaluations. 

In 1972, the OAO divisions based on the executi\% structure 
were abolished and replaced by six functional divisions. Prior to 
this change GAO studied each department's programs separately, 
although analysis occasionally did cross departmental lines. After 
the reorganization and the ^ncomitant concentration on 
functional areas, apprc^riate crossing of department jurisdictions 
was considerably easier. At the subdivision level, though, work 
remains organize along agoicy lines. 

Some important behind-the-scene shifts accompanied the 
formal changes that culminated in the 1972 reorganization. In 
1969, the Comp&oUer General be^ ixitting more emphasis on 
system analysis and operations research. This interest was partly a 
carryover from the Programming, Planning and Budgeting 
Sys^ and Management by Objective ^rategies aimed at system- 
atizing federal managemoit and decisionmaking that were in 
vogue at the time. But Comptroller General Staats also planned a 
broader, more sophisticated role for GAO in program evaluation. 
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It was recognized that achieving this goal would require OAO to 
upgrade its capabilities markedly. The Congressional Budget and 
Impoundment Control Act gen«-ated more impetus to diversify 
the qualifications of OAO personnel. The Act oUled for intensive 
budget analysis, which Congress was not then equipped to handle. 
Anticipating the demand, OAO created a new Office of Budget 
and Program Analysis, assigning it the task of developing a 
capability for extensive quantitative and policy analysis— much 
haulier stuff 'ban checldng vouchers and auditing financial 
transactions, to be sure." However, the budget analysis duties 
were temporary. After less than a year, the new Congr^onal 
Budget Office took ova the tasks. Having been relieved of budget 
analysis, the GAO strengthened its policy anal)^ role by 
chaimeling more r^urces into program evaluation ta±niques. 

Another change in the OAO structure and operations was the 
adoption in 1975 of the lead division mechanism." It was 
introduced to serve as a clearinghouse for activities that do not fall 
neatly into a single OAO division, but straddle two or more, either 
because broad issues or two or more government agencies are 
involved. One division is assigned prime r«ponsibility for an 
activity tha* cuts across GAO divisional structure. The mechanism 
has been instrumental in filling m the cracks left by the shift from 
an agency to a function orientation, and has facilit a ted the 
investigation and evaluation of cross-cutting policy areas. 

GAO proves the theory that organizational structures can 
change more frequently and quickly than the people who fill them. 
In 1972, 0.2 percent of GAO's prof^ional workforce were 
classified as social scientists, compared with 6 percent six years 
later when only half of new hires v/ere accountants and auditors. 
However, even in 1978, nearly two out of every three professional 
staff monbers were still accountants and auditors. 
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OAO has brought in some outsiders at the top since 1970. Prior 
to that time, practii^y all middle and i^per managonent 
positions were filled from within OAO, and th^ were virtually all 
accountants or auditors. During the 1970s, an increasing 
proportion of top-tevel vacancies has been fillet by outsiders, 
including sodal scientists, psychologists, and other professionals 
needed for the new GAG role. 

Because of the complexity and also, no doubt, the political 
hazards of social program evaluation, GAG relies extensively on 
consultants to help tK>lster its in-house expertise and to help give 
the GAO products credibility. They are brought in at a relatively 
low cost to augment specific skill shortages in GAO and to bring a 
fr^h perspective to subjects under analysis. 

OAO officials prefer consultants to contractors because they 
cost less, and b«»use they pomit more osntroi by GAO over the 
end product. Consultants are paid only for the time they work. 
They typically are engaged in projects already planned by GAO in 
which ba^c parameters have been established, or they appraise 
completed drafts. They act as sounding boards and critics for 
ideas that have akeady b^n drculated in GAO, help to m a intain 
the professional standards of agency work, and provide an 
independent perspective on the problems under escamination. 
Unlike contractors, they do not provide a final report, evaluation, 
or study design. 

Officials also feel that they are more likely to get frank 
judgmosts from consultants. Since federal personnd practices 
impose a limit on the amount of work consultants can do for GAO 
in any one year, consultants cannot become too dependent on this 
work. Hence they are more likely to take an indepoident position 
than is often the c se in the executive branch where agencies and 
contractors devdop long term relationships. 

In contrast, GAO has used contractors sparingly in evaluating 
sodal programs, in the belief that by using its own staff it aoi keep 
better control over methodology and fmdings and can better 
assure an accepuble evaluation. It prefers to rely on iu own staff, 
even to the point of sacrificing time and money to achieve 
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thoroughness and credibility. In the 1969 study of the Office of 
Economic Opportunity. GAG did contract for indep«ident 
analyses of selected programs and an across-the-board review of 
similar executive branch studies. Contractors also reviewed the 
adequacy of the information systems us(^ in propam operation, 
and in some cases interviewed program participants." However, 
this evaluation was exceptional. GAO turned to outside help 
because the terrain was unfamiliar. It was also probably necessary 
to ^ve GAO's evaluation more credibility--GAO was not famous 
for the quality of its social sdoitists and, therefore, an internal 
evaluation of GEO programs might have been suspect. 
Neverthdess, in that case, as in the few othm where GAO used 
contractors, the outside contributions were subsumed in the final 
report as a GAO product. Hiis practice can be contmsted to 
ejucutive agencies which routinely spend milUons of dollan for 
outside evaluations that are ^^ed, sealed, and delivered as Hnal 
products, with a minimum of structured intervention by 
government ofHcials. 

The GAO preference for consultants over contractors also 
reflects ^rtain practical considerations. Procur«nent procedures 
for contracts over S10,000 are compile and time consuming. 
When GAO decides to contract, it usually solicits outside help for 
relatively small pieces of larger studies; contracts over $25,000 are 
rare. When executive agencies are routinely soliciting bids for 
studies costing ten tim^ that amount and more, GAO Hnds 
relatively few evaluators competing to do its jobs. On the other 
hand, short term reviews tuid outside commentari^ can \hs 
handled by busy social scientists on a consulting basis, and these 
pay enough for shorter periods of time to make it worth their 
while. 

The Quality qf OAO's Evaluation 

With a suff in excess of 5,300 persons, GAO is by far the larg^ 
of the congressional support staffs. The combined staffs of the 
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Congressional Researdi Service, the Office of Technology 
As5»$mait, and the Congr^onal Budget Offi«; are out- 
numbered five-to-ooe by GAO's. GAO has a long standing 
reputation for reliability and integrity— the ideal watchdog. But 
the field of social program evaluation is a new one. While 
Comptroller General Staats was drawn into it willingly, indeed 
eagerly, many of the middle managment and some top 
management personnd had to be dragged into the uncertain art of 
evaluation. Trained accountants find the work soft and the 
bottom line disturbingly ambiguous, *f not illusive. 

The strength of GAO's performance has rested partly on its past 
reputation as an objective ov«^eer, but much more on its ability 
to adapt to a more complex and pc Jtically contentious branch of 
analysis. GAO has made organizational adaptations. New 
lead^ship has shifted the emphasis of its work and it is acquiring a 
different kind of professional staff to handle a different kind of 
job. In short, faced with new responsibilities, GAO has changed. 
But has the change enabled GAO to serve congressional 
decisionmaking needs better? 

GAO's location in the congressional branch imparts a 
distinctive quality to its evaluation. In r^ponse to congr^donal 
inquiries, Comptroller General Staats has spelled out the 
respective role of GAO and the executive agencies m program 
evaluation: 

It is our view that program evaluation is a fundamental 
part of effective program administration. The responsi- 
bility, therefore, rests initially upon the responsible 
agencies. However, in our opinion, the executive 
agenda too frequently issue reports without adequate 
consideration of congressional needs. . . . The GAO can 
help to irientify these needs for consideration by the 
agencies. 

The GAO ^ assess the objectivity and validity of 
agency studies. ... We believe the Congress and GAO, 
as an arm of Congress, should also have capability to 
make evaluations of programs. The GAO reviews and 
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evoluatioas of isrognuns should not, however, supplant 
the ageodes* responiSbiiitfes in this area." 

At one level of analysis, the effectiveness of the job GAO does 
evaluatini government sociai {a-ograms hinges upon the method- 
ologica! adequacy of its work and the relevance and utility of the 
findings. But evaluation is not GAO's sole function. GAO is first 
and foremost a congressional support agency. Because of that, 
GAO*s evaluation activities are constrained by the reality that they 
must meet the particular needs of the Confess, not of program 
admini^rators or sodal sdentists. Methodolo^cal correctness 
may at times have to be sacrificed to the imperatives of timeliness 
and to the need to pass judgments even when tlxre is inadequate 
infomuuion for the formulatitm of sound evaluations. 

In the evaluation of social programs, the General Accounting 
Office labels und^* the handkaps that staff snembers who do the 
work are, for the most part, not profes^nally trained sodal 
scientists. But what they lack in tedmical know-how and depth of 
program experience, GAO evaluators often make up for with 
the scope of tlieir experience, abundant resources, and 
investigative authority. The variety of experience many GAO 
evaluators have gained is useful in a l^isla^ve branch setting. 
Giv«i the range of demands upon GAO, breadth of eaiperience is 
probably nuire important than ^pth. Noiuthekss, considering 
resources available to GAO, the agency should be able to strike a 
better balance between the employmmt of generalists and 
ipKialists. 

The art of sodal program evaluation, primitive as it is, has 
progressed in the kst several years. Compared to otho' social 
program evaluations, GAO*s efforts are sometimes marked by a 
Uu:k of methodological polish and by inadeq u a t e technical 
understanding of important program feimires and «nitext. The 
sampling methods GAO enq^oys in identifying pardcuhu^ projects 
for study and in collecting program data are somethnes open to 
serious questk)n. 
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The executive agend® that are the subject of GAO*s 
investigations frequently attack OAO findings on methodological 
groutuis. Although ttus critidsms are sdf-smring, th^r are not 
unfounded. For example^ HEW officials, reacting to a critical 
GAO evaluation of compensatory education, questioned the 
validity of OAO's criticism vdth charges of improper sample 
selection, insuf fident sample si», incomplete data collection, and 
faulty data analysis." 

Department of Labor officials challenged the findings of a 
GAO evaluation of summer youth employment programs also on 
the basis of its weak methodology and faulty proc^ of 
inference." As in the case of the HEW compensatory education 
program. Labor's officials dted their own evaluations that 
covered larger samples and presented a markedly differoit picture 
of program effectiveness." 

Given the potential impact of any GAO evaluation, findings 
have to be qualified aad recommendations have to be made with 
caution where the methodological underpinnings or sampling 
procedures are weak. But this pr^ents a dilemma for the GAO. 
Required to respond to congr^sional requests, OAO must 
«>metimes settie for evaluations based on weak methods on the 
grounds that it is better than no review at all. The GAO evaluation 
of compensatory education was undertaken b^use HEW 
evaluations were not complete and the Congress was demanding 
an assessment of program performance. But the evaluation of the 
summer youth employment program was undertaken despite the 
fact that a number of DOL ass^ments of the same program were 
already underway at the time and scheduled to be completed 
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before GAO*s. GAO made reference in its review to the other 
evaiuations. but neither critiqued than nor utilized any of their 
findings. 

Where the GAO work merely clutters the landscape with one 
more opinion of suspect value, it is dubious whether its 
otntribution helps Congress. Where GAO evaluations are the only 
assessments of programs, GAO ought to state cxplidtiy the limits 
on infereo^ that may be drawn from its assessment. To do 
otherwise is to invite unwarranted extrapolations and imperil 
Intimate findings. 

Aside from in-house resource constraints that impinge on the 
methodological and sub^antive sophistication of GAO's work, 
tbm is an important institutional constraint. Being in the 
l^islative branch, GAO is removed from the program operation 
penpedive and advantage that go with that perspective. It cannot 
icadily manipulate variables or gear program management and 
information cdlection to its evaluation needs. Nor am the GAO 
evaluators actively partidpiue in program design or operati<ms as 
do their counterparts in the executive branch. They are, at b^ 
roncHe observers and, indeed, are frequoitly considered intruders; 
they work without enjoying the wetoome mat that is laid out for 
executive evaluators. The mc^ marked disadvantage GAO 
evaluators work mutor occurs in tod^ program evaliuuibn 
activities. Executive agency evaluators cm design the program to 
support evaluation objectives and can monitor its entire life. In 
contrast, the GAO role in such experiments, as in other programs, 
is that of an outsider. For example, whUe M at h em at ic a staff, 
under contract with HEW, were directly involved in setting up and 
operating the New Jersey Graduated Work Incentive program, 
GAO staff came in after the fact to review what had been done, 
comment on technical aspects of the design, and draw their own 
policy condudons from the evidence available to ^lem. 

GAO evaluators do not have the same flexibility or opportunity 
for cooperation and interaction with program staff and clientele as 
executive brai^ evaluators. In trying to measure the effect of tiie 
HEW ^mpensatory education program on academk achieve- 
ment, they relied exclusively on avaihd}le student records, which 
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were incomplete and not deigned to assess these effects. In 
contrast, HEW evaiuators were able to utilize their own battery of 
tests, administer them to program enroUees, control groups, and a 
cross section of other students. They were also able to specify the 
timing of additional data that were to be coll^ed. Their data were 
designed to answer their questions, while the data GAO had to use 
were not." 

Sdf-imposed constraints also affect the quality of OAO's work. 
Hoping to achieve independen'^e by maintaining distance from 
and minimizing interference with program operations, OAO 
evaiuators find themselv^ frequently building analyses upon only 
the data that are available from existing records. The data are not 
always appropriate or reliable for the purpose of answering 
questions put to OAO. This dogged insistence that its work be 
original has costs that may not actually be balanced out by the 
presumed benefits. Resources are wasted reinventing the wheel, 
and while GAO can vouch for its work as far as it goes, the work is 
inevitably restricted b«:ause the analysts cannot hope to develop 
comprehensive pictures from limits program records. In a re^ 'sw 
of programs for migrants and in a study of pubUc service 
employment programs, the GAO failed to draw on the wealth of 
available literature on the subjects. 

The recently established Program Analysis Division has broken 
this mold, showing a great willingn^ to review and synthesize 
ftndinp from other literature. Its analyse, where they focus on a 
particular program, go further in speculating on future program 
impacts and polici^ in the program area than do those of other 
divisions." But the work of the division is imique, not typifying 
the style or substance elsewhere in OAO. Furthermore, analysts in 
this division frequently do not evaluate particular programs in 
terms of the specific laws and regulations governing their 



U. DtpertrnmoofUborendHtelth, Education and Wt{fare AppropriatUjiaJor 1977. 
Pm 5, p. 101. 

29. ComptssUki Oamii of the United Stiles, Se<thn 2S6 Rttuel Hmuing—An 
Evabiatl&t With Lessons for the Futuiw (Wubinftoo: Oeaersl Accounting Ofnec, 1978). 
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implementatioxi, instead focusing on policy areas and future 
options that oscom^.ass a number of laws.'" 

By fffsi^^ng on preserving its independence and, in particular, 
failing to adequately acknowledge other literature and incorporate 
it where appropriate, the GAO divisions that do the vm minority 
of the social program evaluatim^ may be for^ig thtir work into a 
strait jacket that reduces the effe^eness of their work. OAO 
tends to ignore the legi^ative and administrative agendas bdiind 
social le^sbuion and ovorsimi^ the reality hi which social 
programs are implemented. The work rarely questions the 
practicality of congressional mandates and pays too Httle attention 
to the ineviti^le difHculUe inherent in the imptementation of 
social policies. 

The insistence upon independence for fmandal auditing is, of 
course, justiHed. But dsewhere, ihe Umitations this puts on OAO 
reduce the usefuh»ss of its products. The boiefit of independence 
in evaluating the co!!)^plejdti« and nuances of intrioUe social 
programs is ambiguous at best. As Sehna Mushkin has observed: 
"... in the strength of its isolation from Government, [OAO] 
may also find it is removed from the realities of governments, or 
even in its isolation, produce an environn»nt hostile to change."" 

In addition to the problems caused by its isohuion, OAO's work 
has also been marked by unimaginative analyses that sometimes 
oversimplify reality. Although th»e have been significant 
improvements recently, much evaluation has taken legislative 
rhetoric literally and judged agency performance on the basis of 
vague or unrealistic le^slative g(^. OAO evaluators have 
frequently made no attempt to assess the lofty aspirations of 
lawmakers in Ught of realistic, operational hnpedin»nts faced by 
administrators. Tfadr analyses sometime fail to come to grijM 
with the legislative and administrative problems at tl» core of the 



30. compiioagOgMi»l of tin Uaittd Stua, l aeoi uis t i mkf in Mtti/mmi AgK imm 
««f MvrfteftoM <WMWa|U»: OfoeiiJ 

tht Ux of tkt Ttm»f€r Income Mo(m-TRBi-to Aiatyv Weffart Programs 
(Waibiaitoa: Oeacnd AccouaOns Office, 1977). 
3!. Sdnw J. Miubkio, ia l^itletlvi Om^ht and Program Evaluation, pp. 23^2^3. 
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soda! initiatives. They have recommend^ Hat adjustments within 
the c(»ifines of assumption-laden legislation, but failing to see the 
forest for the t3rees, have not done a good job of assessing the 
validity of these assumptions. 

In a 1972 review of federal anploymoit and training programs, 
GAO evaluators reported that the Neighborhood Youth Corps, a 
program established under the Economic Opportunity Act of 
1964, was not achieving the legislative objective of reducing the 
high school dropout rate. WhUe the anal^ rKognizied the fact 
that the dropout problem was signi&antly more complex than the 
Neighboriiood Youth Corps ^uld address, the GAO evaluators 
based their assessment on how well program results conformed to 
the literal goals of the enabUng l^islation." Similarly, in 
analyzing other antlpoverty employment and training efforts, 
GAO placed more emphasis on the letter than the spirit of 
li^slative mandate. Its ass^mrats were m^ianical compari- 
sons between program records and statutory requirements. 
Evaluators paid scant attention to the program oivironment and 
the host of a>nditions affecting the program and its participants." 

The General Accounting Office has made, however, consider- 
able process on this from. The quality of its analysis has 
improved in recent years and shows promise of continued 
progress. In a 1974 review of activities under the antirecessionary 
job creation program, OAO analysts dted the congressional 
rhetoric, but went on to examine the conditions and practical 
obstades faced by employm«it projects." In its review of federal 
programs for migrant and seasonal farm workers, the GAO 
examined the needs of the target population and made an 



32. Coaptrotler Gcaoml of the Uaited Stites, fKknU Maxpowtr 7>eini)v Propvm; 
OAO Qiwaatotog aid Obstrmi^ (WMhintton; Oenoil Account^ Office, F^uvy 
17, J972). 

33. C0in|Kn>aer GoKnl of the United Suto, J^ffkctimim mt AdmMstnaivf 
^ffkinqf c^tHi Canemtmotel Emphyment Pnaram Umkr Titk IB of tht Ea»omk 
O^onm^ AtHif 1964: St. ImUs, Missouri (Wfttiiio^on: Ocaenl Aeeotu^Sf Offk«, 
Noveobo 20. 1969). 

34. Coei{)tro&s' aeaml of tJtie Ufnited Sutei, Thi Ematauy Emphyitwit Act- P^odnf 
F^idptam itt Nomuttidiud Jobs end Reyrisbts Hiri^ lU^ptirmm^ (WitUagtom 
GcDOtl AcwMUHini O^ice, Mindi 29. 1974). 
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imaginative assessment of bow the programs were affecting the 
intended beneficiaries." In much the same vein, a more recent 
GAO report of public servi^ employment programs reviewed the 
unemployment problems these programs seek to address and the 
local forces with which th^ must contend. It assessed the impact 
of the programs in their real-life environment and indicated how 
improvements could be made." 

Effects in the L^isUitive Brunch 

Considering GAO's mission to assist the Congress, its overall 
knoact oi:^t to be measured primarily in terms ef the effects that 
its evaluations have on policy and l^islation. In the distinctively 
political environment of the legislative branch, GAO offers 
nonpolitiod support, striving to provide objective and indepen- 
dent evidence of program performance. Yet to many observers, its 
evaluations of social programs appear to have had little 
appreciable impact in swaying opinions or changing the course of 
policy. Some of the reasons lie in GAO and some in the structure 
of the Congress. 

Within GAG, there is a persistoit tendency to confine 
recommencUuions to oUnor issues involving inaremental changes. 
A typical recommendation urged gr^Uer reliance on dementary 
school facilities in allocating fimds for adult education. Another 
recommendation sou^t to improve the transferability of findings 
from an experimental houdng allowan^ program to other 
housing programs. None of these recommendations was world 
shaking; most suggested technical improvement in policy 
execution and pro-am administration. One extensive Ust of 
recommsidations dealt with upgrading the administration of 
vocational education programs at all levels of government. 
Another offered technical changes in vocational rehabilitation 
l^slation to bring it in line with policy established in other 



IS. CooptroUer Ocami of the United Stato, impact of frnkrai Frognms to Improvt 
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educational legislatioii. Some of the recommendations are so 
obvious as to be unnecessary; some urge actions that agencies have 
already taken. In an evaiuation of a biiingua! education ijrogram, 
agmdes accepted the recommendations while objecting to the 
analysis leading up to them. 

Even where GAO is inclined to offer bolder policy suggestions. 
Congress may be reluctant to heed the advice. This is sometimes 
inevitable, not only because GAO is still new to a game in which 
the competition is keen, but also because Congress is not short of 
paid and unpaid, formal and voluntary, advisors and lobbyists. 
Executive agendes, public interest groups, and constituents are 
constantly vying for the opportunity to pment their views. 
Members of Congress are not accustomed to asking GAO for 
policy advice, and, given the competition, the GAO is likely to 
remain on the siddines when decisions are made. 

By one count. Congress acted on only 17 reconunendations 
made by GAO m the sodal area over a three-year period. All were 
directed at improving aspects of program management; none 
reflected fmduigs about basic program impact or involved changes 
in policy." However, an assessment of GAO effectiveness on the 
basis of direct congressional response to its rosommendations 
would be misleading. Congress watchers might argue that action 
on le^lative recommendations is an i n a d e q ua te indicator for 
judging any group's effectiveness.^' Indeed, Congress rarely acts 
on recommendations from any single source. Furthermore, GAO 
is rarely mduded in the decisionmaking process. If GAO's 
evaluation work has not been eagerly recdved by Congr^, the 
reason may be that Congress, in the past, has not pursued its 
oversight responsibilities effectively. Emphasizhig the passage of 
new tegislation. Congress has expended little effort in monitoring 
programs or checking whether early policy has been appropriate. 
''Congress is oriented to the future, not to the past, so there is a 
chronic neglec t of its oversight role."" 

37. CoB^jcmUcrOeacnl of tiw Uait«d Sum, Aoooal fUfwru for I97S, 1976. ud 1977 
(Wiiiitaitoo: Oeaerii Accooatiof Office), p. 13. PP. 1M7. PP. 9-lS, r«p«siv«ly. 

38. UgisUUivt Ov0nltki and Protram EvabMUkm, op, eit. 

39. AOcn Schick, "E^«lustiai Ev«luitk»i: A CoDgmrional PmpeeUve," io Ugfalattv* 
Omtiihi aid Pn^nm EvahMkm. p. 34S. 
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Another factor that impinges on GAO's work is the codness 
between congressional conunittees and otecutive ag^es. The 
interest in oversight notwithstanding, senior members of 
committees or subcommittees may get defensive about legislation 
they have sponsored and may not be incUn^ to wdcome GAO's 
criticisms. As a result, GAO evaluators are often left out in the 
cold when they lack rapport with, and access to, senior members/' 

The stmy of GAO's contribution to legislative oversight is not 
entirely bleak, however. Aside from the few— but interesting- 
cases when GAO social pro-am evaluations have had an 
i mmed i ate influence, GAO work does, in conjunction with other 
evidotce, make a differei^ in the long run. It oftm spvas 
interests, develops leads, or establishes facts that serve as 
jufflping-off points for committee staff work." Perhaps the most 
encouraging evidence of congressional interest in GAO evalua- 
tions is the active encouragement it is given by committee staffs. 
Although GAO is not the first choice to evaluate scKdal programs, 
the consensus of opinion is that its mere presenw exerts a strong 
influeiu% on executive evaluators to do a more cr^ble job. 

In the separation of powers between the legislative branch and 
the executive branch, the latter has traditionally assumed the lion's 
share of ptoffam evaluation. But that is dianging. The 
Congressional Budget and Impoumimmt Control Act of 1974, 
which included a mandate for vigorous congressional oversight, 
was spurred largely in response to perceived executive abuses and a 
lack of legislative evaluation capabilities.*' The sudden hiterest in 
"sunset" laws and the revived interest hi zero-based budgeting 
reflect similar senthnent. Inter^t in l^slative oversight seems to 
be on the upswing. While the effectiveness of GAO evaluations 
has been Umited, there are mcouraging signs for the future. 

Effects in the Executive Brunch 

While GAO was established to serve the Congress, the fallout of 
its evaluations upon executive agencies cannot be ignored. Indeed, 

40. im. p. 187. 

41. Ibid., p. 117. 

4L OMtmskuial Querwfy WmUy Heport, April 28, 1973, pp. 1013-1018, 
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in addition to evaluadxig at the bdiest of comimttees and msnbm 
of Congress, the GAO may also respond to requests of ^ency 
heads to review their operations. Howeve-. few agency heads ask 
€3AO to evaluate their programs. Instead, the primary vehicles of 
GAO influence are the recommendations made in GAO reports 
and the subtle "threat" of GAO's presence acting to keep 
administrators clean. 

Vutually all GAO program evaluations include recommenda- 
tions to program offidals. Although the recommendations are 
aimed at clarifying congr^onal mandates and improving policy 
implonentation, the GAO too frcquentiy restate the provisions 
of the law establishing the program. It is not surprising, therefore, 
that following a path of least re^stancc, agencies usually concur 
with GAO reconunendations, while sddom changing thdr polid» 
or plans. The agencies assume that once the rQ>ort is filed, GAO 
wiU not return for awhUe and that, also, there will be no organized 
follow-up on Uw GAO recommendations. This is not to deny that 
the GAO reviews are sometimes especially perceptive and 
constructive. But, as a rule, Uiey make no new discovoies and are 
modest in comparison to tiie more comprehensive evaluations 
conducted by the agencies. 

GAO*$ administrative recommendations are much more 
carefully heeded. In its historical field of expertise— program and 
financial roanag«n«it— GAO has brought unique talents to bear 
and its findings have had marked influence. It has «pedally 
effective to analyzing tiie administrative problems that were 
immobilizing tiie federal employees disability compensation 
program, and was the Mime mov« in forcing passage of Ujc 
Federal Employees Comjbnsation Act of 1974. GAO was also 
in^rumcntal in identifying administrative snags in tiie food stamp 
program, Aid to Families witii Dependent Children, ami tiie 
Supplemental Security Income program. 

The specter of tiie "watchdog" looking over tiie executive 
branch shoulder is also important. In tiie opinion of some 
executive branch observers— program personnel and evaluators 
alike-tiie most compelling effect of GAO evaluations is not in 
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tixdr actual, but thdr potential content. More bluntly, GAO hangs 
as Damocles' sword ready to fall on the heads of program 
administrators and ag«icy-supported evaluators. Executive 
personnel frequently question the soundness of GAO evaluations 
and rely more on thdr agencies* own comprehensive studies. But 
the GAO acts as a check; there is little likelihood of an agency 
evaluation being taken seriously if it contradicts GAO findings, 
crude as they may be. It also discourages evaluators from settling 
for a whitewash to avoid GAO exposure. **If bureaucrats 
anticipate that their actions will be inspected by other units of the 
bur^ucracy, by the Ci^ngress, and p^haps by the courts, they are 
more Ukdy to act with a sense of r^ponsibility."" 

On the whole, however, GAO*s contribution to the fonnulation 
and imi^lementation of social policy hi^ be«i minimal. The agmcy 
has lacked substantive program knowledge and adequate 
understandhig of the enWronment in whidi social programs 
op^te. Measures to upgracfe and broadm staff quality take thne, 
and e^qjerioiced technical personnel ip analytical areas are always 
in short supply. Finally, GAO ^ has a reputation as a "keep 'em 
honest" watchdog. It remains dominated by an "accountant 
mentality" for various reasons, including tradition, the prepond- 
erance of accc^tants on the staff, and the fact that even the best 
social science evaluation is a politically and intellectually 
hazardous enterprise about which ihe exp<^enced and prudent 
GAO leaders are understandably cautious. Legidators arc 
accustom«i to GAO doing finandal auditing and have 
traditiottaily turned to other agencies and private experts for the 
evaluation of social programs. With the rising prominence of the 
Congressional Budget Office, GAO is still usually not their first 
resort when a program evaluation is needed. 

TBI CQNOUttioiax Rmaich Snvics 

An Escpanding Mission 

Established in 1914 as the Legislative Reference Service, the 
Congressiona l Research Service (CRS) is the senior congr^onal 

43. Misrii S. Ofui, Congms Oversets the Bureaucmcy (Pittsburg: Univenity of 
Pitttlwrsli Preu. 1976). p. 192. 
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sup{>ort agency. It is unique, acting as a nonpartisan, scholarly 
agency aiso thoroughly involved in virtuaily aU aspects of 
lawmaking. In contrast to the General Accounting Offlcr., which 
has its own accounting and fiscal management respon$ibiliti^ 
apart from its legislative support tasks, CRS is entirely an 
appendage to the Congress. ~- 

Ir *he evaluation of social programs, other features also mark 
the Cungr^ona! Research Service as unique. No stranger to the 
social science, CRS has for years relied on the work uf academic 
scholars and their familiarity with federal social initiative. The 
CRS staff has included prominent social scientists since its 
pr^mrso^t the legislative Reference Service, broadened its 
emphasis dtmng the 1940s beyond iiuiexing and referencing to 
include all aspects of federal policy analysis. CRS now has greater 
depth and scope of experience Li the sodal sdence disciplines than 
ever before. But the agency's role m evaluating social programs is 
a limited one. CRf staff are well-equipped to analyze and digest 
the findings of ^ial program evaluations, and do so ui the 
normal course of their work. They are not technically responsible, 
however, for the direct evaluation of federal social programs. In a 
seminar on congressional oversight and evaluation, a former CRS 
top office* made note of the ambiguity: 

There :s really no significant distirction between 
provi/;: ig policy analysis for pending legislation and 
providing analytical assistance for legislative oversight 
and evaluation. Th^ classically discrete legislative 
functions are in fact, part of a "push-pull'* continuous 
process." 

Althougli CRS has recognized that the distinction between 
policy analr*'^ and evaluation is not very useful, Congr^ has 
decreed that OAO should be responsible for program evaluadon, 
and CRS for the formulaiiou of policy options. Hence, CRS 
cannot formally classify its work as evaluation. Nonetheless, CRS 
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is an important detenoinant of whetho- program evaluations do or 
do not influence the raurse of legislation. 

The roots of the original Legislative Refa«nce Service (LRS) go 
back in history to its parent agency, the library of Congress, 
established in 1800 to support the work of Congr^ by providing 
facts and information to the lawmakers. Library staff have 
worked with members of Congress on the floors of both houses 
and in committee diambers with committee staff. Hiey have 
provided information and ref<U'en^ on a great variety of 
infcHtnation needed by the Congress, r^earched the facts behind 
legislation, ai^ ft^ret^ out legal pr^dents. This htoad mission 
lent coheroice to the Library's work and met the needs of 
Congr^ for more than a ceiuory. 

However, as the information n^ds of Congr^ increased and 
grew more complex, Congress required a more specialized 
reference service to keep track of and assist in its manifold 
operations. HeiKe, the Le^slative Reference Service was formed 
in 1914 '*to enable the Librarian of Congress to employ competent 
persons to prepare such indexes, digests, and compilations of law 
as may be required for Confess and other official use."" 

The demand for the new agency's sauces grew slowly through 
the early 1940s. Most of its work was «^ect^ towards locating 
and referencing information on issues before the Congress, and 
compUsng, abstracting, and iiidenng statute. After World War 
II, Congress further expanded the r^ponsibUities of the federal 
govoimiem. The social pr(^ams initiated in the depression, the 
gigantic military undotaking during the s«»nd World War, and 
the leadhig role Amoica took in postwar world affairs all 
contributed to the enlargemmt of the national government. 

Hiis growth required a corresponding growth in congressional 
staff support. Simpte reference work and legislative indices did not 
satisfy the pressing congressional information n^ds. Congress 
needed more comprehensiveness than the piecemeal analyse then 
available to evaluate the ramifications of congressional actions 

45. AtmuJ.RtportoftheCoHtttuiomilRtsKrck Sgnkeqf the Library ((fConsrta for 
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and identify and assess the impacts of government programs. The 
job was larger and more complicated than it had been before and 
members of Congress and thor personal and committee staffs 
could not do it alone. Staff members were spread too thinly and 
were frequently selected for tiieir political abiliti^, rather than for 
their substantive expertise in policy areas. 

The Legislative Reorganization Act of 1946 represented an 
attempt to adapt Library of Congress services to a larger 
legislative role. It broadened the rcsponsibilits' of LRS, 
authorizing the Librarian of Congr^ to appoint specialists to 
analyze and evaluate the substance of legislative proposals. The 
forward-looking policy analysis role was sem as a logit^ adjunct 
to the "watchdog" audit and review activities of GAO. OAO was 
assigned the task of ensuring that the will of Congress was 
executed, while LRS was to anticipate and illuminate the 
implications of prospective congressional actions. 

For the next 24 years, the LRS mission remained relatively 
mtact as the agency slowly expanded. Staff were added in a 
number of substantive areas to examhie emerging issues. A 
Congressional Reference Division was established to do basic 
reference work utilizing readily available r^urces. This "new" 
addition, providing a specialized channel for responding to 
requests for data and straightforward faaual inquiries, actually 
rcemphasized the original LRS mission. However, it served also to 
demarcate more clearly the function of policy analysis and 
sophisticated r^earch. 

The Legislative Reorganization Act of 1970 accelerated the 
expansion of the LRS and renam«i the agency the Congressional 
Research Service, leavhig it hi the Library of Congress but making 
it more autonomous. The 1970 law also laid the groundwork for 
greatly cxpandmg the size of the service.** By the end of fiscal 
1978, the number of CRS budgeted positions had more than 
doubled to over 800. Annual appropriations nearly quadrupled 
over the same period, rising to more than $23 million ia fiscal 

46. Report oftht CommUm on Rukt oJH.R. J7654, UMislath i ReorsankatUm Act qf 
i»W(WMhii^on: Oownanenl Priatins Office, 1970), p. 19. 
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1978. The growth of CRS, like that of the General Accoimtiiig 
Office, no doubt reflects the expanded ovwdght role of the 
Congress and greater recognition among manbcrs of Congress of 
the need for research and evaluation capabilities independent of 
the executive branch. 

Two-thirds of the CRS staff are in the research divisions and 
work primarily on substantive research and aialysis. Twenty 
ptrcent of the staff are engaged in handling routine information 
requests, and the balance are in fuiministrative positions. The CRS 
is broad-based and rich in wiperiaia. In fiscal 1978, about 
two-thirds w»c professionals, embracing ahnost every imaginable 
discipline, with the vast majority having advanced degrees. Some 
have national reputations as established experts in their fields. 

Administration Structure 

The present structiuw of the rwearch divisions has remained 
virtually unchanged sinc^ it was set up along university 
department lines to carry out the mandate of the Lcgidative 
Reorganization Act of 1946. As the scope of congr^onal 
committees docs not precisely correspond with that of executive 
agencies, the CRS structure has been effective in pulUng together 
pieces of policy that have been scattered throughout the 
government. For example, one study found federally funded 
education programs in 23 executive ag^id^ and education affairs 
handled by 26 congressional committees.*' With an organiza- 
tional structure that cuts across both the executive and legislative 
decision lines and yet requires relatively few divisions, CRS has 
succeeded in focusing the many disparate perspectives existing 
within its policy work. 

The central purpose of the Legislative Reorganization Act of 
1970 was to improve and expand the ability of the Congre^ to 
discharge more systematically and comprehensively its oversight 
responsibilities. The CRS was chosen to assist Congress in this 

47 information Resaurcts and Services AvaOabkfrom tke Library qf Contress and the 
CMptssifmi ReseartA Service. U.S. Conire». Hour Commiaioa on Infonnition tnd 
Ftcffltiw (Waihtafttm: Oovcnnoeat Priatin* Office, 1976), pp. 9M0O. 
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activity. An alternative that was seriously considered was to add 
one staff person to each landing committee solely to perform 
review and oversight tasks. The idea was dropped beoiuse new 
committee staff, it was presumed, would inevitably get involved in 
othn* responsibilities. The alternative of creating a new support 
agency devoted es^usively to oversight was also rejected b«:ause 
of the time a new organi^tion would need to get established and 
because an oversight capability already existed in the General 
Accounting Office and the Legislative Reference Service. It was 
dedded than an incremental apijro^ was best, for necessary 
change was seen as quantitative rather than qualitative.*' 

But quantitative change brought qualitative change as well. The 
expansion of CRS staff has l«i to increased sp^alization. The 
informal structure and close relationship with many members and 
staff of the Congress that marked the old CRS has been difHcult 
to preserve in a large organization. By virtue of its incr^ing size 
(and its past success), CRS is becoming a bureaucracy. Whether 
the informality and person-to-person contact that was its hallmark 
can be prrserved remains to be seen. 

In contrast to the General Accounting OfHce, which does much 
of its work without specific congressional directives, CRS does 
nearly aU its work directly in response to congressional requ^. 
Working hand-in-hand with personal and committ^ staff, CF 
has evolved as a specialized information gathering and analyti« 
resource, geared to respond quickly to routine as well as compl&iv 
congressional inquiries. 

Although there is no hard and fast rule for differentiating the 
varied kinds of work that the Congr^onal Research Service 
does, one crude but useful distinction that can be made is between 
routine inquiries aiul requests for original research and analysis. 
The former constitutes the vast minority of over three hundred 
thousand inquiries CRS received in fiscal 1978. Sixty-three percent 
were answered within a day, 82 percent in five days, and 83 
percent in ten days— a pattern which has remained fairly constant 
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over recent years. In tcnns of numbers, most of the inquiries 
handled by CRS aie requests for fkcts. The typical responses, 
handled by a special reference division, involve verifying 
information, collecting bibliographies, or photocopying library 
materials. 

CRS's major research projects arc comprehensive examinations 
of importaiU policy issues by interdi visional teams. These 
command the most staff resources and probably have the greatest 
policy impact. They are undertaken in response to committee 
requests or in anticipation of committee needs. Although reports 
requested by individual members certainly are not ignored, those 
prepared for committees are assigned higher priority by CRS 
staffers who view the latter assignments as an important means of 
bringing CRS expertise to bear on legislative policymaking. 

CRS evaluations of social programs are conducted chiefly by 
the Education and Public Welfare and the Economics Divisions, 
which arc concerned with economic analysis, employment, 
education, vocational rehabilitation, housing, income support, 
public health, collective bargaining, and economic development. 
The issues Involved In such broad subjects do not always fall 
neatly into one division and so, in dealing with them, staff from 
different divisions work freely together. For example, "red- 
lining" — the restrictive lending practices of many mortgage 
institutions— is a volatile political issue important to national 
housing policy. Yet, because national action against redlining 
touches a number of govcmmeni policy areas, and because some 
of the staff knowledgeable about redlining are not economists, it is 
handled by public administration experts and political scientists. 
Similarly, the Education and Public Welfare Division, rather than 
the Economics Division, coordinates econometric studies of 
alternative income support programs because of their close linkage 
to welfare reform policies. The exact division of labor defies 
consistent logic and cannot be depicted by most organizational 
charts, but the easy cooperation of staff from different divisions is 
effective in broadening the scope and strengthening the quality of 
work. 
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Many important subjects do not fit neatly into the categories of 
traditional acadeniic disciplines. In woridng on welfare reform 
and other broad issues, CRS ^tablishes interdivisional teams. The 
use of such teams received special emphasis in the Legislative 
Reorganization Act of 1970, which sought to strengthen the 
comprdiensive policy analysis capabilities of the legislative 
branch. The workkmd of the Economics Division and the 
Education and Public Wdfare Division is large and growing. 
Between them, in 1978, the two divisions responded to 26,000 
inquiries. Although the gicat majority were routine requests for 
information that were filled quickly, the volume of longer term 
studio was the highest ever. 

Evaluation in CRS 

CRS staff rarely assess program performance. Evaluation Is, by 
their standards, a "tmckward looking" exercise that is 
subordinate to CRS emphasis on policy design and analysis. 
Nonetheless, evaluation, by whatever name it is called, is a major 
function of CRS. To know where it is going, Congrras must know 
where it has bMn. Good policy analysis— the examination of 
alterimtive courses of action and of their implications—requires 
an assessment of prior experience. For that purpo», CRS makes 
much use of evaluations by executive agencies, the Ckneral 
Accounting Office, private organizations and interest groups. 
CRS examines those evaluations, digests them, and incorporates 
the conclusions, if not the details, into policy analysis. 

An assmment of the job that the Congr^sional Research 
Service does m evaluating social programs has to recognize that 
CRS was established to fill a support role. Although it strives for 
nonpartisanship, it is very much a part of the congressional 
decisionmaking process. The legislative milieu is one of turmoil, 
with abrupt shifts in priorities and the sudden emergence, whether 
real or imaginary, of pressing new issues as events and 
circumstances dictate. It is hardly an atmosphere conducive to 
careful and deliberate academic research. In that setting, much 
CRS research is a race against time in which thoroughness 
conflicts with demands for speedy delivery. The fact is, one of the 
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main purposes for siting up CI^ was to provide Congress fast 
service. As time is an overriding consideration on the Hill, CRS 
must respomi quickly to inquiries, m matter how implex. When 
time is nm oitkai, Congrm is more ^t to try some source othnr 
than CRS. 

Neither congressional staffs nor the General Accounting Office 
can quickly piyriciiai sudh a wide range of expertise as CRS. 
Congressional staff are spread too Uiinly to be able to analyze 
issues carefully on a regular basis. The rigid emphasis that GAO 
places on time-consuming investigative field work puts it at a 
disadvantage, too. In CRS, highly spedalized staff can be 
concoitrated in a single issue area. CRS can also use omside 
organizations and consuUants where in-house talent is short. This 
^eamlis«l rese^di structure is wdl adapted to the congressional 
pace, making it very popular. Possibly more than half of all CRS 
r^earch is conducted on a rush basb wheat quick, albeit le» than 
thorough, analysis is necessary if thy product is to be of any use. 
L^slative (tedsions have to be made quiddy. The ^ress on speed 
rais^ the question of whether CRS oicourages relatively shallow 
instead of more thorough analyses that might be planned in 
advance. 

There is no shnple solution to the dUmma, but the CRS 
approach has great merit. On the slower moving issues, CRS may 
help committees to plan hearings, blocking out the mi^r issues or 
{UUng gaps in the broader, deeper base of knowledge presented in 
hrarings. In that setthig, it is rarely the only source of analysis 
relied upon by the Congress. Even during the 1974 mergy crisis, 
CRS supplied only a portion of the mformation and analy^. 
Many other sources were also employed, and when pressure from 
spechd interests delayed final action, closer .«cruthiy of the major 
proposals was pebble. That kind of haste is rare when social 
legislation is enacted. Some would argue that congres^nal action 
is lu slow as social change. Still the legislative process is fiill of fast 
turns and sudden decision points. Initial congressional response 
may be changed and shaped in legislative debate. Different Idnds 
of analysis are persuasive at different times, and varied 
approaches are need^ to meet the needs of members. Regardless 

Do 
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of how mudi fact-fli.Jing has preceded coo^^ooal actios, new 
amdysis and new proposals always arise. This is the kind of 
situation in which CRS, it is asserted, can be most helpM, 
although it is hard to pinpoint spots where CRS has been dedsive 
in resolving a major policy issue. 

The essential value of the Congressional Research Service's fast 
responses is to present reasonably unbiased information and 
analysis adapted to congr^sional time constraints. Wt&i time 
permits, basic data from a variety of sources are more carefully 
reported. When time is short, it may be posdble only to 
summarize or extract a few sources. Such varied response are 
charact^tic of CRS. 

In evaluating social programs and policies, CRS has two 
alternative and comi^ementary pnxxdur^' reviewing available 
literature, and coUecdng data from operathig agencies. Although 
time may be crucial in determining v^ch methods are employed, 
it is not the only factor. There are unportant qualitative 
distinctions among the source thsd also determine how CRS goes 
about evaluating social programs. 

Literature searches are the primary soutm of information for 
CRS analysts trying to assess social programs and domestic policy. 
When the Congressional Roearch Service was established, 
members of Congress mviiioned a r«earch service that would 
analyze and digest available information and analyses and 
synthesize them in a t»lan(^ examination of the important 
perspectives bearing on legislative issues. The lawmakers had hi 
mind a service that would do more than collect relevant studies, 
but would stop short of gathering its own primary data. In an age 
wl»n the volume of information is as overwhebning as its content, 
the objective has proven farsighted and enduring. The aim today is 
to lay out the differoit perspectives, showing the "facts" and 
analyzing opposing views. The hope is that the product will help 
the legislators to reach a balanced view. For example, an 
oomination of the effectiveness of employment and training 
programs analyzed and summarized a ML spectrum of studks of 
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the impact of federal initiatives.*' The studies had been done by 
federal executive agencies, research groups, spf^al interest 
representatives, and scholars. In an evaluation of the Appalachian 
Regional Commission, CRS analysts reHed on a variety of sources 
for information on the Commission's goals and effectiveness," 
Much had been written about the Commission, and there was no 
need for CRS to start an evaluation from scratch. 

Wten available lit«ature on progsam operations is inadequate, 
as is often the case with !»w sodal programs, CRS is Fikeiy to rely 
on operational data from executive agenda. This information is 
particularly relemit in evaluating the effectiveness of those social 
program^i that are highly sensitive to changes in the ^nomy. The 
strain that the 1974-75 recession put on income support programs 
rais^ major qu^tions in Congr^ about appropriate legislative 
responses. Existing literature was understandably inadequate to 
shed new light on the problems. To 131 the void, CRS analysts 
relied on executive agen^ data to determine how well ^e 
proip:ams were holding up, and to gain insists into emerging 
chan^ in program operations. After evaluating these data, CRS 
pres(mtc»i its analysis of the situation and examined measures for 
making adjustments in the income support programs. Ready 
a^ess to operational data is an important aspect of CRS program 
evaluation and policy analysis. Without the dose ex^tive agency 
rapport and the opportimity to use operating data as a basis for 
evaluating policy effectiveness, analytical work in this area would 
be stale and often useless. 

Independence 

Although the informal cooperation of CRS and the executive 
agen^ has proven fruitful, it is not without problems. The 
(kpendencc of the service upon executive sources of information 



49. RaySdunin, ThtJ^fKHvatmofMatipQwer 7>einiiit Progmms: A CsmpUatioa cf 
Olutrmiioia mi Qutcbakmi {Wsstdofton: Conjreitiooai Resmcb Servi(», May 29, 
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50. John Mi&risin, A Sekctivt Equation of the Appakchkm Regiomt CommissUm 
Prngnm (Wubiastoo: COiifretflotuI ReKvcb Service, M«r^ IS, 1973). 



02 



84 EvaUuuk^n the l^gisktive Braiu^h 

raises doubts as to whether the data and analyse that CRS 
suppHes to Congress are tainted with biases of the o]>erating 
agendes whose activities Congr^ b attonpting to assess. 

The independence of the l^slative branch hi ov^sedng 
executive activities has been a recurring issue throughout the 
hi^ory of congressionid support agoides in goieral and the 
Congr^onal Research S^ce in particular. The Legislative 
Reorganization Act of 1946 reflected the sentiment that the 
Congr^ should be able to do its own analysis so that it coidd 
independently asses executive performance and achieve a more 
active role in formulating and analyzing policy options. The 
analytical capability that CRS b^an to develop after the 1946 
legislation progressed slowly, at best. The burgeomng social 
programs in ^e 1960s intoisifled the issue of legislative 
independence. Suspicion and mistrust between the two branches 
were exacerbated under the Nixon administration. Many became 
convinced that Congress would be at the mercy of the executive 
branch if it placed excesdve trust in agency evaluations and 
perspectives. It was ^sential, according to this ^ew, that the 
legislative branch he able to generate its own data about executive 
branch perfonnan^. 

The Legislative Reorganization Act of 1970 and the Congres- 
sional Budget and Impoundment Control Act of 1974 enhanced 
the imlependaice of lei^slative analysis. Th^ two pieces of 
legi^tion laid the foundation for a managonent information 
sys^n deigned to provide concessional analysts with direct 
access to certain agency budget and operating data. The laws also 
stressed the importance of OAO program evaluations and 
enlarged the staff of CRS, enabling it to conduct more analytical 
work. The extent of the changes has been more quantitative than 
(lualitative, though. The automated system for retrieving agency 
budget and program data is iK)t yet fully operational, and it is 
unliiEely to have much impact on CRS evaluations. In a list of 
priorities prepared by CRS analysts, the use of raw data ranks 
low. WI»n time is pressing, the analysis of (Rational details 
seldom takes precedence over the use of existing evaluations. 
Because of practical considerations, direct access to operational 
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data will probably continue to contribute less to independent 
analysis than wiU the opportunity to discuss with eiecutive 
evaluators the strengths and weaknesses of thdr work. 

When the data provided by the executive agoicies are 
hiaoequate or unreliable, CRS can, in theory, turn for hdp to the 
Ckneral Accounting Office, whidb is better able to generate data 
based on its own observations of onsite operations. Yet, in 
practiM, CRS staff rarely use OAO data for their evaluations. 
GAO collects data for its own specific needs which seldom 
con«pond to those of CRS. Furthermore, si/hen both 
assess the same progmm or agency, it is usually for the purpose of 
producing ind^endest opinions. Consequently, CRS use of OAO 
data might be sdf-defeadng. 

In any event, the operating statistics that OAO produces are 
often i n adequat e for (kpicting current situations imd are more 
dated than the agency figure. Hoice, CRS tends to prefer its 
agoicy source. The (bta are more act^ssible, suitable, complete, 
and timdy. In short, CRS staff members bdieve that independent 
data are not as important as indq)endent analysis. 

Getting Outside Help 

Contnurtmg out work is a new but growing development at 
CRS. Outlays increased from $3,000 in 1970 to more than 
$800,000 in fiscal 1978, contributhig significantly to the agei^s 
independence. The number of persons being brou^t in from the 
out^de on a temporary basis has been increasing as the growing 
demands on CRS periodiodly overtax its staff. Ratha than trying 
to enlarge that staff to meet every foroeeable inquiry, the trend 
has been to contract for specialized services. The rationale is to 
maintain a solid core of in-house staff for most CRS work and to 
cope with any overload, and unusual or specialized requests, by 
contracting. This ^ategy has been fostered both formally and 
faiformally. the Legislative Reorganization Act of 1970 permits 
CRS to issiie contracts without advertising for bids, eliminating a 
time^onsuming and frequently wasteful process. On tl^ hsformal 
side, the fairly simple CRS structure and contract review 
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procedures reduce time and red tape. Because of these conditions, 
rare among federal aga»^, CRS can negotiate contracts much 
more quickly than most executive agencies, pres^ving its quick 
repose capability. 

The reasons for comra^ng may vary among CRS divisions. 
The Economics Division relies on contractors for work that its 
st^f can, but are not available to, do. The Education and Public 
Welfare Division uses them for new ideas or analytical techniques 
that its staff still lacks. The latter is deeply involved in some highly 
politidad issu^; over the years it has developed closer contacts 
with the legislative committees than have other CRS divisioas 
be»use committee members and thdr staffs heavily rely on it, 
eager for nonpartisan guidance. Moreover, income maintenance, 
the financial soundness of the social security system, and the 
effectiveness of federal mensur« to deal with social ills are all 
topics that inspire heated debates. In such areas, the limitations of 
social sdence evaluation are kemly felt. New ideas and 
information are manif^y needed. CRS has tun»d to consultants 
and contractors in the hope that they can provide some fresh 
insights or at least convince Congr«s of the complexity of the 
issuM. 

The growing use of contracted servi« is becoming an int^ral 
part of the agency's response capability. But two forces are at 
work that may reduce the effectiveness of contracting. The first is 
the *'catch-22*'— the built-in hazards of a high volume of contract 
work. The second is the evil attending the a^ng of bureaucracies. 
When contracdng was mre, it could be handled with dispatch by 
short and simple procedures. But as the volume increased, formal 
and mo^ complex procedures were r^uired so that all requ^ 
could be screened systematically for their potential usefdness, 
eeommy, and effectiveness. By 197S, handling contracts became a 
full-time jc^ for a contract officer. While the review procedure b 
not a tangle of red tape, definite contract procedures have been 
establisbed, and the entire operation has taken on a formality that 
was net there in the early 1970s. 
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Assessing CRS 

Virtually the entire missioii of the Congressional Research 
Service is mirrored in the needs of the Congr^. As the 1970 
reorganization act stated: *'Upon request, CRS wiU supply 
committer with experts capable of preparing, or assisting in 
preparing, objective, nonpartisan, in-depth analyses and ap- 
praisals of any subject matto*. These analyses and appraisals will 
be directed towards assisting «}nunitte«. . . 

Most of the service's substantive reports synth^ize and analyze 
the work of others. It is assumed that findings are selected and 
presented in a balanced, nonpartisan manner. CRS is supposed to 
remain pristinely pure, apolitical, fair, and objective. Congress 
hoped that it would "... insulate the analytical phas« of 
program review and policy analysis from political biases and 
therefore produce a more credible and objective product."" 

the effectiveness of CRS must be judged largely by the degree 
of congrssioiml confidence in its work. The agency's managers 
have duly noted this fact, and periodically they have asked their 
congressional clients to evaluate CRS work. Although these 
surv^ do not ^ver aU congres^onal users, do not employ 
rigorous sampling techniques, and may be partly self-serving, they 
do provide some insights into client opinions of CRS. The service 
receives high marks— over 95 percent—for fast and pertinent 
respond. Less positive though is the finding that CRS does not 
provide the kind of comprehensive material that 20-2S percent of 
the members want. 

In 1975, CRS expanded the questionnaire to get more detailed 
responses. Congressional users rated CRS reports on Hve criteria: 
thoroughness, clarity, selection, balance, and overall quality. 
About 80 to 90 po'cent of the respondents rated CRS Ugh on 
thoroughness, sdection, and balance. Committee staff ^ve 
consistently lower marks than members, perhaps because, given 
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thdr specialized responsibilities, they were more inclined to 
recognize inatkquad^ in ttu* CRS analyse." 

Another important index of CRS work is its d^rce of 
objectivity, i.c., docs CRS prepare balanced reports that arc free 
of bias and give equal trcatm«it to both (if not all) ^dcs of aii 
issue? To satisfy 535 mcmbffs of Congress on that score is 
probably impossible. But CRS received high marks for balance 
roughly 85 percent of the time from both Democrats and 

publicans.'* 

xkcause of the complexity of the legislative and policymaking 
processes, it is impossible to isolate the specific impact of CRS 
evaluative work. A^in, the pr<x>f of the pudding is in the eating, 
and some evidence might be gleaned from congr«sional usage, the 
agency's sole market test. It is probably fairly safe to assume that 
congressional users will go back to the CRS sources that it finds 
helpful. Using this line of reasoning, the avaUable facts mdicatc 
that CRS must be doing something ri^t. But the assessment must 
differentiate b«^een responses to consistent requests and 
substantive products. The ino-eascs in "quickie" requests 
frequently reflect a respond to constituent inquiries and usually 
require simply supplying a copy of a publication. These requests 
may be an indicator that more oeople arc writing to their 
representatives in Congress ratho" than a test of CRS 
performance. 

The usefulness of CRS can be better judged, therefore, by the 
rise in analytical work for committws. CRS has been devoting an 
increasing proportion of its r^urces to this work. Both the 
Senate Ubor and Human Resources Committee and the House 
Education a*-d Labor Committee, traditionally heavy users of 
CRS, have appreciably increased thdr requests for analytical 
w ork.*^ 

S3.0»yLteEvia$aDi}Dao Mdofck. "Report *» U»« Rwuto of Oie December 1975 
Ff fffrpr4r Sittvey," Coagreirioail RcMStrcb Setvks, 1^ It. 1976. 

54. Memoraaduis from G«fy Bvst «od Din MeJnick to Njmnmn Becknan, Actini 
Director of U»eCoii|r«ikauaRae«ichSenic«,M^^ 1976. 

55 A!iimlRe«mioftbeC<KJr«ik«iJJiU««rcfaS^ 
prejwred for the Jotet ComnOttee on the Librtry (Wwhtaiton: Oovemnwtt Priottai 
Oflioe). 
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This sug^ts that CRS clients wetconte its help in the social 
policy area. Critics might respond that Congress is so desperate 
for information that it turns to the readiest source, r^ardless of 
its quality. A consensus view would have to be mixed. On the 
whole, CRS may not provide lK;tter information and analysis than 
anyone else, but it can provide them quickly and reliably. 
For Congress, ignorance is not bliss, and half a story in time is 
better than the entire story too late. 
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The prime functions of evaluation in the executive branch focus 
on ocbnixiisUvtion ai^ pr{^;ram opemtioa, nohor than on 
ov«^ight as in the legislative brandi. Responsible for implement- 
ing the will of Congress, the oiecutive agendes must diamine if 
programs are meeting established objectives and how effective 
various componoits are. 

However, the purposes of evaluation vary among the executive 
agendes. They nm the range from a seardi for answers to spedHc 
qt^tions to the general advam:«nent of knowledge and id^. The 
Office of Management and Budget (OMB) hbs, from time to time, 
placed a strong emphas^ on the "deci^on rekvan^" of 
evaluations, implying an almost one-to-one correspondsice 
between an evaluation and a particular decision.' But this narrow 
application of evaluation has gained little currency. Evaluation in 
the executive branch more commonly has t!» broader function of 
raising the levd of uiulerstamling about the impact of social 
programs. 

The evaluation scene in the eieeutive branch is intricate, 
confusing, and sometimes contradictory. Units with one or 
anotlw ewaluadon respons^ility saturate every nidie of the 
bureas^acy. To ooalog ev^ evaluator or to assemble a complete 
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picture of evaluation units and their activities would be nearly 
impossible and of little value. The Office of Management and 
Budget discovered just how difficult it is to determine the scope of 
evaluation activities when it attempted to "standardize** them.* Its 
proposed "general guidance and r^ponsibilities for administra- 
tion of program evaluation activities within departments and 
agencies"* elicited little mtcrest from program offidals. Of 
course, any such attanpt is bound to meet some resistance, but in 
addition, the arguments against standardizing evaluation formats 
or management procedures were compelling. 0MB defined 
evaluation narrowly. Mcludmg much that is considered entirely 
legitimate; the "guidance" offered was so broad as to be useless. 
In brief. OMB was no more successful than others in clearly 
defining the scope of evaluation, and its clout proved a poor 
substitute for intcUcctual substance. The mandate never got 
beyond the draft stage and OMB efforts to rationalize evaluation 
max^emoit have been suspends!. 

Learning from that experience, it is reasonable to conclude tliat 
the search for a representative model of social program evaluation 
in the executive branch is not a rewarding pursuit. Despite some 
methodological amilarities, tiie variations among agencies and 
agency components are substantial, and generalizations about 
evaluation activities in executive departments are of doubtful 
value. 

Four factors visibly affect tiie role and influence of evaluation 
in federal agencies: the organizational location, the funduig base, 
tiie position and power of tiie people planniag evaluation agendas, 
and tiic channels for incorporating evaluation findings into policy. 

Organizational location and, of course, budgets arc particularly 
useful gauges of the weight asdgncd to evaluation in an agency. It 
is important to consider where the evaluation unit is lodged hi the 
hierarchy of authority and the r^ge of functions assigned to it; 
proximity to managers may enhance its influaice on operations 



2, Ibid. 

3. Fcn>tadoO«xK»."I^t OMB Clrcukr on Ev«lu«Uoo of Federal PfOfrtm*," 
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while jsroximity to research may reduce it. While the level of funds 
available for evaluation is obviously signifi^t, the source of 
support-— in terms of s^-asides versus budget line items— may be 
no less important. 

The organizational structure in which planning ami managing 
evaluation take place is a good indicator of its standing. Unlike 
research, evaluation strives to have some immediate relevance to 
agency policies, strategies, or tactics, ami a dir^ relationship to 
agency operations or plans. This requires genuine interaction 
betw^ program administrators and evaluators, starting with the 
planning of evaluation projects. Yet, evaluators too frequently fail 
to involve program managers adequately in their planning, and 
vice versa. 

The ways and extent to which evaluation findings are used are 
obviously important in detamining the nature of evaluation 
activiti^. Just as the involvement of program people in the 
evaluation planning stages is one indicator of how seriously 
evaluation is taken, so is the willingness of agency decisionmakers 
to predicate action on evaluation findings. 

The Two Leadino Aasscm 

The experioices of tl» Department of Health, Education and 
Welfare and the Department of Labor offer more than mere case 
studies. These two departments administer the largest volume of 
outlays for social program accounting, and comprise 70 percent of 
the approximately $140 million federal expenditure for social 
program evaluation in 1977.' 

HEW's Establishment 

Tbc Department of Health, Education and Welfare, with the 
largest budget and second larg^t workforce of any department, 
has a sprawling evaluation establishment w ' u h spends approxi- 
matdy $80 million annually. HEW provide so many social 

4. SMWfDou of Wsyae Onmquiit in Ccsf Manoi^iU aarf UHUggtion of Human 
Raouiva Prosmm Evdmtion, 1977. U.S. Cmvm, Sesmxs Committee M Humtn 
R«MUi«a (WuhiofUm: Oovcmneat Pfin^ Ofike, 197S}, p. 4. 
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unices to Aoierican citizens that It is virtually impossible to 
describe its mission in a single, coherent statement. Its manifold 
missions reflect piecemeal growth as new legislative mamdatCT have 
pil»i upon old. The independence of HEW*s individual rgendes 
and the confusion and overlap in their missions shape their 
evaluation activiti^ in a way that resists organizational 
rationalization. 

Evaluation in HEW crisscross lines of jurisdiction and can be 
highly competitive. Each of the six HEW assistant secretaries has 
an evaluation office. The Office of Human Development and 
Education Division focus on planning and budgeting oi^ations, 
while the Social Security Administration concentrates on research. 
The degree of centralization varies. In the Office of the Assistant 
Secretary for Eduction, one unit is supposed to monitor all 
education evaluation. Yet, its cooperation with the National 
Institute of Education and with the evaluation activities of the 
Office of Education is minimal. In the PubUc Health Service, most 
evaluation is conducted by six program agencies which are 
relatively independent of the central Office of Policy Development 
and Planning. The Office of Human Development, much smaller 
than the Public Health Smice, also mmntaiss a decentralized 
operation with each of five program agenci^ conducting its own 
evaluations. At the top of this shaky pyramid (it is p«-hap$ more 
like a pile of gravel), attempting to provide some central guidance 
and struggling, not too valiantly, to introduce a semblance of 
administrative order and tidiness, is the Office of the Assistant 
Secretary for Planning and Evaluation. This office has a nommal 
veto power over the evaluation plans of agencies but, recognizing 
the fniitlessness of trying to cast them all in the same mold, has 
wisely not exercised it. It uses its pivotal position, instead, to 
execute its own evaluation agenda in areas cutting across 
organizational lines. 

CompUcatitig the jumbl^ organization of evaluation units are 
their splintered budgets. With some occepilons, most evaluation 
funds come from set-aside provisions specified in authorizing 
legislation. The National Institute of Education funds evaluation 
from its regular operating budget authorize to support research. 
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Two agencies in the Office of Human Development fund 
evaluation from administrative set-asides. The Social Security 
Administratiim has no separate budget for evaluation. It is 
. funded, instead, under the rubric of research. Before it was 
absorbed into the Office of Human Devdopment and the Social 
Security Ad^oinistration, the Social and Rehabilitation Service 
funded evaluation from its research budget, with the Department 
of Labor and the Community Services Adnunistration providing 
partial support on sev^ joint pro^'ams. The Office of the 
Assistant Secretary for Planing and Evaluation funds its work by 
tapping up to 25 percoit from agency evaluation budget. 

Advocates of e>^uation argue that legiidative set-aside 
provisions assure financing for their activiti^. However, the full 
amount of the available set-aside is rardy exixnded. The Public 
Health Service, for example, has seldom us^ more than a third of 
the available set-aside; in recent years, it has used only a fifth. 
Uncterutiiization of evaluation r^ources is mainly attributable to a 
lack of staff for monitoring projects and assessing the findings. As 
set-asides cannot be used to hire more staff due to congre^onally 
impeded p^onnd ceilhij^ and administrative policy, the volume 
of evaluation has been arreted below pr^cribec^fundhig levels 
and certainly falls short of achieving the congressional and 
administration objective of evaluation. 

However, even if the full evaluation set-asides were spent, it is 
not certain that a marked increase in genuine evaluation would 
result. Given the variety of defmitions and conceptions of 
evaluation, many administrators use evaluation funds for 
activities that are more appropriatdy dassified as research, on the 
one hand, or program management, on the other. 




Labor's EstabUskment 

Labor is a snudler department than HEW, with a narrower 
mission ami a more compact evaluation ^tabUshment. But that 
does not necessarily render its evaluation activities orderly or 
coherent. There are intricacies in DOL too that muddle the way 
evaluations are carried out and complicate the analy^ of how 
evaluation fits into the process of policy devdopm«it. 
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The DOL organizational structure for evaluation resembles that 
of HEW, but the smaller scale of operations and historical 
devdopments creates dramati^y different conditions. The 
Office of the Assistant Secretary for Policy, Evaluation and 
Research (ASPER) is roughly analagous to the HEW Office of the 
Assistant Secretary for Planning and Evaluation, albdt with 
agnificant differences in substance and style. In theory, ASPER is 
responsible for internal housekeeping duti«, as well as its own 
evaluation agenda. The former responsibility is intended to 
provide technical assistance and review evaluation activities, while 
the latter has involved investigating broad policy areas that cut 
across departmental lines or go beyond the program concerns of 
the Employment and Training Administration (ETA). In practice, 
ASPER has, at times, exercised a heavy hand in shaping the 
evaluation activities throughout the department, pursuing an 
evaluation agenda based on limited operational knowledge which 
has sometimes been completely inappropriate and even damaging 
to program implementation. 

Five operational programs in Labor have formal evaluation 
offices, but only the Office of Program Evaluation and Research 
in the Employment and Training Administration is sufficiently 
developed to make policy contributions. Excluding the temporary 
demonstration projects mandated by the Youth Employment and 
Demonstration Projects Act, the lion's share of evduation is 
undertaken by the ETA Office of Policy, Evaluation and 
Research. ETA accounts for most of the dollars allocated to the 
entire department, and spends a commensurate proportion of all 
departmental evaluation and research funds. 

Thb Tw Factok 

Among the competing interests that vie for program resources 
and administrators* time, evaluation normally gets a low priority. 
Given the choice, most administrators would probably pare it 
down to eliminate a source of aggravation and augment program 
operations. But they do not have much choice, L^idativc 
mandates and top policy decisions frequently override manages* 
preferences. Occasionally, demands by the news m(Kiia or groups 
interested in the performance of a specific program can also be 
inesistible. 
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At HEW these is an unwritten cioctrii» of agoicy sovereignty in 
program evaluation, a reflection of the federal system where each 
bureau diief controls liis or 1^ own fkfdom. Givoi the countless 
misdons of HEW, it is assumed that ^h agency head ^erally 
knows best. The Assistant Secretary for Planning and Evahiation 
usually offers f«^'^nfpji assistance only when asked. Although that 
office reviews ail aj^ency plans, including some invitations to 
prospective evaluators, most agency evaluation proceeds with 
remarkable independence. Substantial evaluation funds are 
available at the disposal of the assistant secretary for financing 
evaluations without int^fering with a^ncy plans. It is also not 
Ukdy that the assistant seoetary has the power to intrude on 
i^ency policy. But some of the indepei^ence that agency 
cvaluators enjoy is due to a belief in the assistant secretary's office 
that little is to be gained— intellectually, politically, or 
programmatically^from a more c^traiized syston. 

The decemralization of HEV/ evaluation may be the best 
arrangement for that department, but it ^rtainly involves costs. 
Quality control suffers, and could be improved by a more 
vigorous central review. Some agency work also appears to be 
parochial and self-serving. While in existence as a separate agency, 
the Social and Rehabilitation Service conducted successive 
evaluations of vocational rehabilitation programs. Relying upon 
questionable methodolo^cal designs, most conduded that these 
propams were voy effective in making their clients employable. 
The National Institute of Education has repeatedly been attacked 
by congressional critics for abstruse studies ha^dng little bearing 
on important policy issues. In a fit of pique, the Senate 
Appropriations Committee went so far as to eliminate the 
Institute's entire budget for fiscal year 1977. Although the funds 
wne later restored, the bask criticism still held. The Insti^'s 
evaluation work was independent from the Office of Education 
and largdy divorced frcmi the latter's operational reiponsibilitio 
because HEW lacked the machinery to exerdse constraints. 

No such claims of agency sovereignty could be made in the 
Department of Labor. In recent years, evaluators' activities at the 
fl f itia tflnt seaetary ^el have been overbearing in contrmt to the 



98 EviUution in the Exeoitive Branch 



timtted power exercised by thek HEW counterparts. Political 
considerations may account for some of the differences. H£W 
agencies have thdr own constituencies to support and defend 
them. The Soda! Security Administration, Office of Education 
and National Institute of Health all operate under legislation that 
gives thdr h^ids statute^ power independent of the se^etary. 
The ]ss$sy of HEW*s &-eaticm— an afterthou^t ccmsolidating a 
number of previou^y existing agencies—is a chain of command 
that does not always give the secretary die final say. In contrast, 
the Department of Labor preceded the establishm^t of its 
component agencies with the exception of the Bureau of Labor 
Statistia. Agoicy heads have not enjoyed the statutory autonomy 
of thdr HEW count^parts, and many of the pro-ams— young 
Bad still evolving— have not devdop^ strong outside constiti^n- 
cies. 

In keying with the strong central control in DOL, the central 
evaluation agency in the department has gained a tig^t grip over 
aU the department's r^earch and evaluation. What is now the 
Office of the Assistant Secretary for Policy, Evaluation and 
Research was established in 1963 "to antidpate changes that 
would affect wage earners, and to devdop poUdes and programs 
to promote thdr wdfare in the context of such devdopments.'" 
Focu^ initiany on polky studies and planning, ASPER virtually 
ignored serious evaluation efforts. Whatever evahiation was 
unckrtaken in the department was looited hi what is now the 
Employmem and Training Admii^stration. 

In tiK several years that followed, ASPER stayed out of the 
tfifiinftfgym of polky developmoit. Then, hi ti» early Nixon 
adminittration, it sought to enhance its partldpation in 
departmental pdky formulation and evaluatiia. But the 
Manpower Admhu^raSion c<mld not be easily dislodg«i from its 
preeminent role. It had tiie funds and the troc^ to rebuff any 
ondau^ from ASPER. In tise eariy 1970s, ASPER was briefly 
infitipnfiat, havlog hired some quaHHed evaluation ^f, and Its 
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Evab«ioaan4 RoMidi. U& O^Nutamt of Udwr. Witbioftoo, Jwuaty 1^, p. 1- 
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top officials bad the ears of the department's policymakers who 
identic criticai policy issu« that needed evaluating. ASPER 
also provided some very ef feOive direction to evaluation levities 
throughout the departm^t to assure their r«sponsivoi^ to the 
needs of departmental heads and to use evaluation fmduigs for 
policy development. However, this era was short-lived. 

The frequent change of secretaries in the early 1970s upset the 
delicate relationship that had given ASPER its power. While its 
technical competence remained intact, its influence waned. Its 
senior officials were rebuffed and disregarded by a succession of 
secretaries and ASPER lost most of its effectiveness in leading the 
departmait*s evaluation activities and in directing its own 
evaluation to relevant pcriicy questions. Through the mid-1970s, 
its evaluation guidance was marked increadngly by a slavish 
devotion to methodological rigor and an absence of policy 
relevance. Cut off from the principal policymakers, it imposed 
itself on the rest of the evaluation establishment as an unhivited 
moldling model manipulator. In the late 1970s, however, sUbility 
in the office of the assistant secretary, b^ter access to top 
policymaken in DOL, and a wiltingnets on the part of th(^ 
policymakm to pay attention to ASPER all appear to be 
improving its impact. 

ASPER's influence over agency evaluation begins with the 
planning process. Although it has a review function much like that 
of the HEW Office of the Assistant Secretary for Planning and 
Evaluation, this is a pro forma exercise because of its prior 
involvonent in tl» fc ^ulation of evaluation plans. ASPER 
personnel partidpate in the planning process by identifying topics 
for study and dictating the methodobgies. Where agency 
evaluation plans do not address the issues that ASPER considers 
to be iuipoftant, or where thdr methodologies are not suitably 
sophisticated (regardless of the reasons), ASPER steps in with its 
own ideas. In budgeting agmcy evaluation activities, ASPER 
holds the purse ^rinp and also has considerable influence over 
agency thinking. In 197S, r^earch and developmost committees 
were set up in each program area to institutionalize previously 
informal ^mmunications between evaluators and administrators. 
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The purpose of these committees is to pick up where evaluation 
planning leaves off, maintaining a sense of rapport between 
program people and evaluators during the induct of evaluation. 
Although the grou{»i do not always contribute appreciably to the 
success of evaluation projects, ASPER is always involved and uses 
the opportunity to bring its opinions to l^ar. 

In the role that it has carved out, ASPER is not just an adjunct 
to '.valuation planning and management, but serves as another 
administrative layer superimposed on agency evaluators. It 
duplicates their responsibiliti^ in a way that often pits ASPER 
opinions directly against agency thinking, crating what may be, 
in many cases, an unnec^sary tension between departmental and 
program level interests without the benefit of a complementary 
management process to r«olve them. 

In the case of the fledgUng Occupational Safety and Health 
Administration (OSHA), ASPER was up to its usual form. In 
OSHA*s first few years of existence, there was confusion about 
the agency's mission, to say nothing about a lack of evaluation 
expertise and mistrust of the role of evaluation. ASPER took what 
amounted to an adversary role, and further contributed to 
OSHA*s difficulties by evaluating its activities with little re^d to 
legislative mandates or stated, agency objectives. Program 
performano; was roundly criticized from all quarters and 
evaluation was seen by some as one more opening for criticism. 
Consequently, little attention was given to evaluation within 
OSHA and any attempts to impose it from the outside were viewed 
with suspicion. 

Not surprisuigly, agency evaluators in the past have not entirely 
appreciated the intrusion of ASPER*s housekeeping activities 
either. In many cases, its technical assistance is seen as 
interference. ASPER's original role was to make the fmal review 
of evaluation plans. However, because ASPER perceives the 
agencies' in-house capabilities as being limited, it has often been 
involved at every stage of evaluation management from 
prelhninary planning to final product review (agency evaluators 
see this nitpicking attention to details best handled by them). 
ASPER's attempted involvement has been encouraged by the 
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weakness of agency evaluation staff. But the factors that inhibit 
evaluation by agency staff have also inhibited ASPER's efforts. 
To overcome these methodological and substantive obstacles 
requires not regimentation but a thorough understanding of 
agency missions and need for knowledge. 

In the past, agency evaluators have not re^rded their limited 
staff si;^ and capabUities as adequate justifi^tion for ASPER's 
activism, which they have seen as an unwelcome intrusion of 
abstract thinking into a pragmatic environment. ASPER's 
hidependent attempts to evaluate departmental programs have 
been seen as wdl-intentioned but inappropriate. Whereas ASPER 
staff have believed that agency evaluators embrace outmoded and 
inferior methodologi« and apprc^hes, many program evaluators 
and managers have thou^t that ASPER was— to borrow the late 
Jacob Vincr's apt phrase — building models without vital organs, 
that is, models with little relevance to agency needs. Early in its 
existence, the Occupational Safety and Health Administration, 
charged with regulating workplace safety and health conditions, 
had disagreements with ASPER about evaluating the economic 
impact of regulatory actions. OSHA had prepared inflationary 
impact statments based on the cost of compliance to assess how 
to phase in ^orcement of its regulations most economl^y. 
ASPER prepare cost-benefit analyses based on assignmg dollar 
values to bei»flt5 of the liv<s sav^ and Isduri^ prevented and the 
costs of controlling occupational health hazards. Th^ analyses 
were used in selectively applying regulations where benefits 
exceedcxi costs. OSHA, which had to clear its impact statements 
through ASPER, disagreed with the latter's approach on a 
number of grounds. In a report that accompanitKl his resignation, 
one former assistant secretary for OSHA stated that . . the 
methodology associated with such [cost-benefit] analyses is in its 
formadve stages. The variance placed on any estimates of dollar 
benefits of disease and death is so great as to be virtually useless. " 
He further criticized ASPER for opposing the spirit of the law, 
which required the uniform application of regulations.* In short. 
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an operating agency criticized ASPER for applying a methodology 
that was unrdated to its real policy dioices. 

Another criticism levded against ASPER has heen that its staff 
are too far emoved from program operations to be sensitive to 
their problems. Evaluations favored by ASPER have tended to 
examine mio-o- or macroeconomic issues, relying heavily on 
assumptions but little on actual program experience. In contrast, 
agency evaluations ^dress narrows and more practical 
programmatic issues. Many r^emble proc^ evaluations, 
assessing the results rather than basic underlying premise of 
program strategic. In 1974, the Manpower Administration 
completed an evaluation of the public employment program; the 
following year, ASPER completed a related evaluation of public 
servi^ employment pr(^ran». The former focused on the effect 
of the program in job creation/job restructuring, diaract^tics 
of the workforce, and changes in munidpal s«^ces. The latter 
also examined the effects of public service jobs on the composition 
of municipal labor forc^, but went further. Indulging in 
simulations (read guess^) to ^timate the hnpact of the public 
service program, it tried to analyze more fundamental economic 
issues such as how public service jol i aff<^ lo^ labor markets, 
behavior of labor, and aggr^tc levels of «nployment. The 
results were publicized as Department of Labor fmdings, although 
the conclusions were bas«i more on tl^ ideologic of the 
evaluators than on facts.^ A 1976 study by the Employment and 
Training Administration examined the eff«:tiveness of the Work 
Incentive Program in placing AFDC recipients in permai^nt jobs 
and improving thctf chances of becoming economirally self- 
sufficient. In contrast, a 1974 report by ASPER concentrated on 
the macroeconomic issues of net increases in aggr^te output, 
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income distributici. effects of WIN, and the trend of aggr^te 
AFDC costs over time.' 

In short, the ASPER evaluations toided to speculate about 
broad i>olicy issu^, filling the gaps in programmatic data with 
heroic assumptions. But there was little appreciation in ASPER's 
work for the reality of program operations and the manner in 
which they impinge on policy. The st^e reflected the fact that 
ASPER evaluators v/en more removed from decisionmaker than 
were agency evaluators. Hence its critaia for judging program 
performance differed and its r^ults conflicted with those of the 
operating agency. The tension this created was natural and 
predictable. While agency staff felt somewhat threatened, ASPER 
staff justified it as a natural process for airing all sid^ of an issue. 
The ASPER standards for performance were not necessarily 
undertaken as attempts to s^ond-gucjs agoicy sponsored 
evaluations; rather they were merely seen as alternative 
perspectives. But the reality and expectations they implied wm 
misleading and, rather than presenting the clarification of another 
legithnate view, served to confUse. Few policymakers or news 
media persozmel could realistically evaluate the heroic assump- 
tions of model buildo^ and their oftoi spurious exercises were too 
often mistaken for rigorous analysis. The net effect of ASPER's 
rsmbunctiousness was to alienate many program managers and 
reduce ASPER*s ability to lead cr^ble evaluation activities m the 
department. 

Despite many attempts, ASPER was not able to ^establish 
itself as a serious policy analysis arm of the Labor Secretary until 
19T7. Then continuity in po-sonnel, good rapport between the 
offl^ of the secretary and the assistant secretary, and marked 
hnprovement in ASPER's ability to direct impact evaluations 
began to bring ASPER back into the mainstream of serious 
program evaluation and policy analyas. However, much of this 
progress has been interrupted by pr^ures to respond to outside 
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priorities. Prodded by the White House to tackle such issues as 
compreii«isive employment and training Ic^slation and welfare 
reform. ASPER has had to Mglect some narrower, but 
nevertheless important, departmental policy concerns. 



Managing Evaluation 

The quality of management— of selecting, grooming, and 
monitoring contractors and consultants— is crucial to the success 
of evaluation projects. In the executive branch, very little 
evaluation is done in-house. Nearly all is done under grant or 
' contract. 

At HEW (excluding the Social Security Administration), 
in-house projects account for no more than 5 percent of all 
evaluations and of all set-aside expenditures. In addition, outside 
projects are more rradily identifiable than in-house evaluation, 
which is often indistinguishaWe from routine management review 
and analysis. 

However, some clearly identifiable evaluation activitiK are 
conducted in-house. The utilization of evaluation fin<^ngs for 
administrative and poKcy purposes is an importaat in-house 
activity, although it commands only modest rraources. In this 
utilization process, contract reports are only an intermediary 
product or raw material. For example, practicaUy every evaluation 
effort in tiie Office of Education (OE) coUects data from state and 
local education offices and institutions. If the information needed 
is not part of an ongoing record file, OE contracts for its 
collfction, pahaps after some stati^ical tests to assure proper 
sampling and an optimal orgatdzation of the data. OE then uses 
tiie contractor's product for analysis to answer tiie primary 
evahiation questions, ^larly, tiie Sodal Security Administra- 
tion contram most of its fidd survey work and thm analyzes tiic 
dau in-house. Once evaluations have b«n completed, most 
tgendex prepare "policy hnpUcation papers" (PIPs) and executive 
summaries of the findings. 
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At the Department of Labor, a much larger proi>ortion of 
evaluation work is done in-house. As late as 1975, in-housc efforts 
accounted for about a Hfth of all evaluations.* In keying with its 
academic aura, most of ASPER's in-house studies had involved 
investi^tion of spedalized methodological issu^ of doubtful 
policy dgnificance. Hie T&vlt did little more than add another 
laya of obfuscation and belabored technical detail. The analyses 
responded less to speciHc n^ds than to the aspirations of ASPER 
analysts to advance tiuir academic standing. 

In contrast, the evaluations of the Employment and Training 
Administration and its predec^sor, the Manpower Administra- 
cion, have refined operational i»Eeds, and the m*house staff 
studies have, for the most part, aimed at program management 
problons. The advantage that the in-house evaluations pr^um- 
ably enjoy is that, because they are not made public, they 
encourage more candor on the part of progmm officials. While 
probably little of substance in the uncirculat«i reports is lost to the 
world because of this emphasis on management issues, the practice 
fosters a kind of dualism in which managers r^pond dif ferentiy to 
in-house and to contract evaluators. 

Some ofncials and cmtside obs^ers would prefer more 
in-house evaluation in the executive branch, arguing that the 
insights, cooperation, and continuity tiiat can be achieved with 
in-house staff are vital if evaluation is to be more than a pro forma 
exercise. The lo^ of th^ arguments notwithstanding, there are 
extremely strong forces limiting the amount of in-house work 
being done now, and there is not likely to be much more of it in the 
near future. All executive agencies opiate under personnel 
ceiUngs that hold their onployment to congressionally-determined 
levels. An increase in evaluation personnel would require a 
decrease somewhere else, robbing Peter to pay Paul. Given the 
relatively low priority accorded to evaluation work, such increase 
in evaluation personnel are unlikely. However, In spite of the 
ceilings on personnel, creating the illusion that the size of the 
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€«lerai payroll is k^t under strict controls, the funds available for 
valuation are relatively generous. These funds are fr^uently used 
to hire contractors, consultants, or persons on loan from other 
agencies and institutions who are not reckoned against agency 
personnel ceilings. Hence, the full-time equivalent of evaluation 
personnel supported by HEW and Labor under contract and 
consultant arrangements far exceeds the number who are 
employed in-house. One labor-sponsored study found 58 
comulting firms doing business with Labor and HEW along a six 
block stretch of downtown Washington.'" 

Staff shorties may also place constraints on the number and 
quality of contracts. Despite the extensive use of contracts, the 
Public Health Service has been able to spend only a third of its 
set-aside funds because it has lacked the staff to develop requests 
for proposals, review bids, monitor contractors or review 
completed evaluations. If other HEW agencies spend more of 
their set-asides, that is partly b«:ause they ^ve less attendon than 
Public Health Service to overseeing contracts. Most evaluation 
staff agree that some contract work is off target, incomplete, or 
otherwise unsatisfactory because staff resources have been 
stretched too thin. While it would take only a few additional staff 
to improve HEW's contract monitoring, it would take a large 
number^^-^veral hundr^— to conduct all evaluations in-house. 
And, by contracting, HEW can choose from a larger pool of talent 
than it could hire directly. 

At DOL, in-house staff have not been strctch«! so tlunly. As the 
number of personnel has dedined, contract resources have also 
declined or grown only slowly. The sole reception has been in the 
Office of Youth Programs, which is benefiting from a massive 
infusion of resources for evaluation, research and demonstration, 
in connection with demonstration legislation enacted in 1977. 
Except for that office, evaluation budgets have not pown in 
proportion to aggregate agency budgets, as none of Labor's 
agencies operates under set-asides. Management is also simpler, 



10. Albert D. Bktennu and Lame M. Sharp, 7^ Competitivt EvahmthH Re^irch 
tiiduttry (Wastiiastoo: Buiou of Soda! Sdeoce Reuarch. Inc., 1972}. p. 33. 



ERIC 



Evaluation in the Executive Branch 



107 



being controlled by a centralize office rather than (Uspersed 
ajnoiig many programs. As programs have been added, evaluation 
functions have been grafted on to existing organizations, lending 
more continuity and coherence to evaluation poUcies. The ability 
to conduct a sizeable iu-house study program indicates a better 
balance than HEW enjoys. 

But staff limitations are not the only reason for conducting 
evaluation by contract. It is presiuned that because of po^uasive 
influencei within agencies, agency personnel evaluating their own 
program have a hard time being objective, and an even harder 
time achieving credibility for their studies. This has been the 
underlying assumption that has been so instrumental in keepmg 
contractors in business." Uss attrition has been paid to the 
putative objectivity of contr^ors whose inter^t it is to ptease 
their sponsors. A survey of social policy experts done for the 
Research and Technical Programs SubcoromittM of the House 
Committee on Government Operations lent support for this 
rationale as far back as the mid-1960s. Qae ob^rver stated: 

I do not think that f^eral agencies should themselv^ 
conduct my kind of sodal res^ch. My reason is that 
the value of any kind of social research is grratly in- 
fluenced by the amount of autonomy of the researcher. 
I think that it is v»ry easy for hitra-agaicy serial 
research to b«»>me a polidca! and tx}licy football. 

Unfortunately, relying on outside talent for an evaluation does 
not automatically guarantee independent work that will be free of 
agency biases. Not wishing to be killed for bearing ill tidings, the 
hired evaluator who depends on getting evaluation contracts or 
grants from an agency may be less than candid about program 
shortcomings. 
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But disregarding for a moment the relative merits of outside and 
in-house performers, there is a singular virtue to setting aside a pot 
of money for outside evaluation instead of using the money for 
in-house personnel. Administrators tend to downplay the 
significance of evaluation, and when it is not specifically provided 
for in legislation and must be done by in-house personnel, it can be 
too easily pushed aside by higher priorities. 

One alternative approach adopted by Labor to overcome 
personnel ceiling constraints and at the same tinac assure the 
availability of evaluation assistance is to establish nonprofit 
organizations that couple evaluation with program administration 
chores. Largely fmanc^d by federal funds, but with some seed and 
"mad" money providwi by foundations to assure a degree of 
independence, these organizations conduct research, demonstra- 
tions, and evaluations. The Manpower Dononstration Research 
Corporation is possibly the most prominent of the "intermedi- 
ari«." The 1977 youth legislation requiring massive research and 
evaluation spawned Youthwork and the Corporation for Youth 
Enterprises. There are many ways to skin a cat, and government 
officials are learning the skills needed to survive and to carry out 
their functions. 

OtmmE Evaluation 

Since fiscal 1974, most HEW sponsored evaluations have been 
done under contract. Before r^trictions were introdu<»d in 1974, 
HEW obtained much of its outside evaluation by grant. The latter 
device had its advantage for both program administrators who 
still had a lot to Icam and hustling grantees who had a lot to earn. 
As the Great Society programs got umterway in the 1960s, 
program adminisuators reaUzed the gaps m their knowtedge for 
dealing with the new diaUeng^ and the grantees were only too 
happy to make an effort to find answers to uncertain questions. 
The response was a move to evaluate programs and support 
research activities furthering understanding of social problems 
and the options for aUeviating them. Grants were a lo^cal way to 
attain the objectives. They were open-ended and left the grantee 
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with a pot of money, free to investigate whatever issues were 
considered important. If program administrators did not know 
what questions to ask, they left it to the grantee to 0gure out the 
line of inquiry. The results were predicuble. Some of the grants 
yielded relevant findings; others contributed to a more general 
body of knowledge. Many efforts turned out to be nothing more 
than fishing expeditions or, more frequently, examinations of 
issues which interested the investigators (who often propoi^ the 
projects in the fost place). Hiey were off target, slow in coming, 
and of little use to decisionmakers. They did little more than 
satisfy the curiodty inv^tigators had about their own pet topics 
and raise thdr living standards. 

In 1974, as part of a movement to require competitive bidding, 
HEW virtually ruled out any further support of evaluidons with 
grants. Part of the rationale was that effective program evaluation 
demanded asurances of more agency control and decision 
relevance than provided by ^ants. They were deemed inadequate 
vehicles for obtahihig timely and useful information and answers 
to specified questions. But there w^ other motives as well. Nfany 
political appointees of the Nixon and Ford administrations were 
uncomfortable with the evaiuators who had been the bei»fidaries 
of the grant system and with the policy implications of some of 
their work. The hope was that contracts would inject an 
entrepreneurial spirit into the evaluation estabUshment, mcourage 
competition, and attract pragmatic investigators whose message 
would be more to the liking of agency policymakers having greater 
control in selecting issues for evaluation. Another reason behind 
the push for comp^tive contracting was the notion that 
evaluation could be treated as a commodity and that there was a 
market for evalua^n sevices that functioned reasonably well. 
Both assumptions have remained improven but nonetheless 
inviolable. 

Today, grants are still used extensively for research and selected 
demonstration projects having built-in evaluation component 
requirements. However, contracts are now prcfcrrcxl for 
evaluation projects. 
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In DOL, no fonnal department edict has proscribed evaluation 
by grant, but a long-standing policy favors the use of contracts 
over grants. The glib assertion is that grants are used when it does 
not matter what happens to the money. While that understates the 
obligation that grantees have to the department, contracts do 
impose performance obligations that are unrealistic for grants. 
For this reason, contracts are used almost exclusivdy in the 
evaluation work where the agency can specify what it is seeking, 
while grants are made to academic institutions and for efforts 
supporting a specific researcher when few funds are involved. 

In sum, evaluations have \xcn view^i as nec^sary to provide 
specific information needs, and contracts have ^ven both HEW 
and Labor greater control over deadlines and the course of 
research. But they have not eliminated substantive problems of 
evaluation, such as the ability to raise the right qu^ons, to 
provide guidance, or to evoke and sustain the best efforts of 
investigators. 

To CoicPBTE Oft Not to Compete 

Agency staff may use either noncompetitive or competitive 
*'Requests-for-Proposal" (RFP) contracts. In noncompetitive or, 
as it is often called, sole-source procurement, an agency selects an 
evaluator without going through an open bidding process that 
brings other contractors into consideration. The n^pient of a 
sole-source contract is obligated to provide an agreed-upon and 
specified product or service. 

Sole-source awards account for some 60 percent of all federal 
contracts and grants. The justifications for this type of 
procurement are that a particular evaluation calls for unique 
capabilities, and that the contracting agency knows who the best 
performer is. The unique qualifications may include the fact that 
the contractor may base the acknowledged expats in a field, 
proprietary rights to nect^sary information, or distinctive prior 
exp^ience. Whatever the precise reason, the pr^umption is that 
no one else can do the job as well or as promptly for the same price 
or at any reasonable price. 
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Sole-source comractiog has <^er important advantages. It is 
usually faster than a competitive prooss. Despite the extra 
justi^cation it entails, it can usually save two or thiiee months and 
frequently more by eliminating the preparation of a request for 
proposals, advertising, and reviewing proposals. Sole-source 
awards also encourage Uoser relationships b^een evaluators and 
agency officials. They indicate a personal or intellectual 
compatibility and trust between the agmcy offidals and the 
contractor. Sole-source evaluators t^d to be repeaters much more 
often than competitive award evaluators. The continuity reduces 
the time an evaluator needs to berame familiar with agmcy 
procedure and facilitates the flow of information. Under these 
conditions evaluations are often better adapted to ageacy i»eds. 
By relying upon recogni^ expertise, sole-source procuronent 
tends to put more emphasis on the finished product as con^asted 
with the RFP route where agency officials are obliged to assign 
staff for preparation of proposals. 

Of course, noncompetitive procurement is subject to abuse 
when the healthy rapport turns into a cozy rdationship between 
offidals and evaluators. Evaluators quicidy learn t!» party line 
that offidals want to hear. Agendes may Hnd themsdves paying 
more for sycophancy and getting less information than they would 
in an open mark^. Because of the potoatkl for abuse, sole-source 
procurments are governed by speda! regulations. 

Both HEW and DOL require extensive documentation to justify 
all sole-source awards. Cumbersome review procures ointrol 
the award of any contract in excess of $25,000. Hie process may 
reduce favoritism, but it introdira delays md red tape. This 
frequently discourages resort to sole-source and, as a result, the 
award may goto the applicant who has the resources to cope with 
the ob^ades rather t^ to do the mmi qualified job. Not 
surprisingly, the G^eral Accounting OfHce and Confess view 
sole^urce awards as smacking of cronyism and favor the tight 
control of the practice. Tbs Nixon and Ford administrattons 
favored competitive biddhig because of Ideological convictions 
and a suspidon of the social scientists who have tended to recdve 
noncompetitive awards. The Carta* administration has also 



112 



Evaluation in the Executive Branch 



sought to reduce sole-soiuxe awards. Accordingly, contracts for 
evaluations have generally been dedining in number and size. 
Since the line between researdi and evaluation is frequently 
blurred, it is difficult to quantify the number of sole-source 
awards that might be classified as dealing with evaluation. But one 
count shows that in fiscal 1971, the Office of the Assistant 
Secretary for Planning and Evaluation in HEW awarded 65 
percent of its contracts and 57 percent of its funds m sole-source 
form. Two years later the ratio of noncompetitive contracts 
declined to about a third of the total funds, along with a decline in 
the number of contracts.*' By 1978, officials estimated that less 
than 10 percent of HEW evaluation funds were awarded 
noncompetitively. 

At Labor, sole-sour^ procurements for r<»earch have not met a 
similar fate. The Office of Policy, Evaluation and Research 
prefers sole-source to competitive contracts for the same reasons 
that it prefers grants to contracts. Its strongest argument is that to 
procure r^earch by «>mpetitivc contract is to treat the buying of 
knowledge and insights on the same basis as the procurement of a 
tank or a typewriter. Unlike hardware procumnents, r^earch 
reports are highly individualize— the products of distinctive 
craftsmen and profe^onals that defy advance spedflcations. 
Moreover, if research award«i were funded «clusively by 
contracts resulting from a osmpetitive proc^, many academic 
scholars would avoid the «nploymrat and training arena. 
Competition forces r^earch into a production mode that rewards 
the grantsmanship capability of the more experienced consultant 
organizations and crowds the academits out of ti» market. 
Sole-source contracts also help research managers keep in close 
contact with the academic world and arrange quickly for special 
i^udies that may be needed. 

Labor's Assistant Secretary for Policy. Evaluation and 
Research consistently relies ahnost exclusively on sole-source 
contractors for evaluations. As ASPER does not award grants for 
r^arch, sole-source contracts serve as an effective substitute. 

!3. Dtpanments of Labor and Htalih, EdueBtkm, end Wtlfv* Appropriations far 
1977, A^pto^iMtimi Cwnmittte, U,S. Uoose of Repraeouttivw, 94tb ConiKM, 2ad 
S fttkm {Wi»hla«UHK Ckweroineit Printlni Office, 1976), p. 9H, ftnd committee file*. 
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At HEW, the Sodal Security AdministratioQ's Office of 
Research and Statistics (ORS) has dom a great deal of sole^source 
comracting in the past. Unlike other HEW evaluation units, ORS 
is heavily involved in basic research as well as program evaluation. 
It also does a great deal of in-house work. By one estimate, slightly 
more than half of all the ORS contracts and contract expenditures 
in 1975 and 1976 were sole-source. However, by 1978, ORS was 
succumbing to departmental directives and awarded none of its 
contracts on thai ba^. 

fbn comp^tive award prowss hinges on the request for 
proposals (RFP) or bids from prospective contractors. RFPs are 
prepared by agency evaluation staff with assistance from program 
personnel and agency administrators. As an extension of the 
evaluation plannhig process, ag^icy staff may spedfy the 
objectives of the project, the m^odology and sampling 
specifications, allowable statistical margins of error, and general 
format requiremoits. The RFP poses the questions to be answered 
and the "specs" or editions that must be met, so that potential 
bidders can determine whether they should enter the competition. 
If they decide to compete, the RFP serves as a guide for thdr 
proposed work plan and budget. 

RFPs can play an hnportam role in shaping the nature and 
quality of the evaluation prodtu:t, indicating the approach that is 
required and the time allotted for comptetion of the project. 
Unfortunately, many RFPs fail to do so. Although the General 
Accounting Office has yet to issiie a report on the pr^mration of 
RFPs by agency personi»l, individual OAO investigators have 
found it often to be a case of the contractor tail waggmg the 
agency dog. Too often the RFP lists broad requirements 
as^licable to any evaluation, without specifying the methodology, 
objectives and data that are sought in a given project. In these 
cases, the contractors must add fl^ to the skdeton and det^mine 
the approach they will take on technical issues and even, at times, 
define the issues. Although they are not in the position to 
determine the policies and programs to be evaluated, they often 
determine the relevance of evaluation findings. 
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Oi^ the request for proposal has been prepared, an open 
competition may ensue in which any contractor is free to submit 
bids; alternately, a more restricted procedure of negotiations with 
selected contractors may be employed. Open competition is 
preferred by the General Accounting Office and contract officers 
more often than by evaluation offices. But, it is slow, costly, and 
does not always assure that tte b^ product will be obtained. 

In open competition, the availability of an RFP is announced in 
the Commerce Business Daily, a government publication for 
advertising federal competitive procurements. In a number of 
cases, agendes take the pains to mail RFPs directly to qualified 
contractors to assure that they are notiHed of the comp^tion. 
Contractors who choose to a>mpete submit proposals and bids. 
The bids are then reviewed by a committee of project officers, 
program personnd, and contract officers, and an award is 
negotiated with the contractor who is ranked first on a composite 
score in which points are assigned for such factors as cost, 
methodologie, tl» qualifications of staff, and prior experience 
and performance. 

Notwithstanding its attractiveness, open competition has its 
co^. Being open, the process is, on the surface, very egalitarian. 
But this requires the agen^ to undertake the effort and expoise of 
mailing RFPs to many contractors who are only remotetv 
interested in bidding. After bids are submitted, the agmcy must 
mount an intensive effort to weed out the unqualified bidders. 
One study of competitive procurm^ts foumi that 13 ope* 
solichations yielded more than 2,500 requests for RFPs and nettea 
286 bids. The RFP mailing, r^pomiing to inquiritt, and the 
subsequent review proc^ required a massive effort by the 
contracting agencies.'* 

in addition to the direct expense to the agency in open 
competition, the cumulative expense to contractors m be 
staggering. The Bureau of Social Sdeoce Research reviewed 444 
proposals submitted for 12 awards valued at $4 nullion. Interviews 

14. Albcn D. Bkknaia nd Lmn M. Sh»rp, An AnaJysa of 36 CompitUm 
Pfoautmmtt of SoeU Av^/wn Evahutkm Studia (Wtshin^oo: Burcsu of Sodtl 
Sckfitt Kc^««n^ lAC, 1974), p. !1. 
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at fifteen Anns submittliig bids ^bli^ed that the cost of 
proposal preparation ranged from S200 to $16,000, with a median 
of $3»0D0. The authon estimated that it <»st contractors more 
than SI. 3 million to bid for $4 milUon worth of contracts." 

Following classical economic doctrine, supporters of opsn 
competition assert that the high cost of incompetent drives 
unproductive and inferior contractors out of the market. 
However, if it assures a higher quality of contractor in the long 
run, it is only at high overhead c<^ to the survivors. Furthermore, 
that analysis, as well as the uitire RFP medianism, is based on the 
h^ic assumption that RFP quality correlates well with the 
quality of eraluation prodmts. As a consequoiM, it appears that 
an inordinate amount of the "expotise** purchased atmially goes 
into pr^Niring the RFP.'* In actuality, the system of open 
competition puts a premium on proposal preparation, not project 
performance. 

Agencies sometimes expedite procurem^mt by distributing an 
RFP to a restri^ed number of organizations rathe- than 
advertising it for open competition. The distribution may be based 
on reputation or on prior contra^ or grant work with the agency. 
The attonpe is to foster a degree of comp^tion suffldent to make 
a good choice but not wasteful in time and money. The restrioed 
competition approach sigidft^ that the agency is seriously 
interested in proposals &om designate contractors. It in^roves 
the response to RFPs and nets a higher proportion of good 
proposals. However, it also leaves agency officials open to 
diailmi^ from organizations that were left out and may result in 
delaying the award of a contract. 

Preselection prac^kes vary. In one case, the National Institute 
of Education ccmibhied fiftm evaluation proposals and then 
mailed the package to 447 pmelecte^ bidders. Usually, however, 
presele^on is more rettricted. The median numbo' of bids hivited 
in one survey of evaluation contract offerings was 13. The survey 

1$. Bidcrauo and Simp, ITtt €kmpttUiv9 EvtAietkm RtntKk iHdunry, pp. 3S-39. 
H. Albert 0. Bidcrmtn lod Uoit M. Sliarp, "The Evahuttioa aoou^ Cooffltmity: 
RFP Reatei. Bidder*, and Wlnnen," Evghuakm. Vol. 2, Na J, 1974. pp. 36-40. 
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showed that, when open competition was employed, one in nine 
oi^iaizitioQS that asked for an RFP submitted a proposal* 
wtoeas with presdection, nearly half of the organizations 
responded." 

Thi PBiytrifBii 

Because ti» executive branch does so little social program 
evahiation and research in-hoa», the qualifications and selection 
of performers are vital to the federal evaluation effort. What 
performer characteristics are important in influencing the overall 
quality of the work? The question has no easy answers. Different 
conditions and drcumstances call for different strengths in 
evaluators. Even when a pred% set of qualifications is needed, it is 
not always dear which organizations can best supidy them. 

The research and evaluation community is often dichotomized 
into for-profit and nonprofit sectors. For-profit organizations are 
cginnusrdal voiturra organized to make money. They are 
frequentiy viewed as hired guns, rody to evahiate anything for 
anyone under any terms and ^oditions, so long as they make a 
profit.^* The nonprofit organizations, dther affiliated with 
academk institutions or independent entities, are supposedly more 
mission oriented than for-^^f t firms, concentrating on specific 
poQcy areas, disciplines, or u.r nods. In fact, many compete in as 
wide a range of areas as any for-profit group. 

Clearly, this didiotomy is mislgariing. The real distinctions 
between the two kinds of organizations are more obvious to an 
accountant or tax Umycs than to an agency evaluator looking for a 
consdentious peiformcr. Many non|»rofit organizations are just as 
{promotional as tl^ most spirited for-profit firms. Although they 
ax« tax shdtered, tiworetically serving educational and sdentific 
purposes, they are in fact oftm in the thick of competit&sn, vying 
with one ai»tiier and with for-i^ofit firms for the same work. 



!7. Bidermis lad Sinrp, Ah Aa^ysk qf 3$ Cmpttfsivt Procunaimo qf SoM 

IS. Um N. BmMia HoMnl E. Ftmbsb. ^eadtoifiv 
(Nmt York: KutMS S«a> FoiuidMiOB, 1975). pp. 5&^. 
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Many arc in the midst of an identity crisis over everything cxapt 
whether to retain their tax exemption, and some have rdinquished 
that status.'* Some observers see no difference between them and 
for-proflts:.'*In many cas« the so-cailed nonprofit organization is 
in fact proHt maldag in every sense of the word with the exception 
of tax ^atus.*'" The frequent movement of staff among 
contracting or^nizations and the use of consuitantships, 
subcontracts, and joint bids to bind or^nizations together further 
blur distinctions among the different kinds of institutions. For 
example, acadmi^, the group whose ointributions praumably 
make the work of nonprofit and acad^c imtitutions more 
"rigorous,** fr^uently serve as consultant to for-profit firms and 
nonprofit institutes. From the standpoint of agency administra- 
tors, the proHt/nonprofit categorization is of little value in 
det^mining who should do a particular evaluation. 

Another tempting way to cat^orize performers is according to 
the academic qualifications of their staff. However, there is little 
demonstrated relation betweoi the quality of an organization's 
performance and the numbers and kincb of degree its staff 
have.'' As in the case of teaching, degrees are far from perfect 
proxies for the substantive performance criteria. 

JUDGINO TBS EVALUATOU 

Because of the variety of conflicting demands placed upon 
evaluators, the task of determining who performs b^t cannot be 
undertaken uniformly and never with a great deal of conHd«ice. 
Government evaluation is constantly caught in the tension 
betweoi the demands for thorouglm^ and timeline, for 
sd^itillc rigor and policy relevance, for independoice and 
familiarity, and between epistemological limits to what research 
can explain and the virtually unlimited numbers of qu^tions 



19, OrUns. pp. t37-I3S. 

20, John H, Kofroo, Tke Use of Social Research in Federal Domestic Programs, m suff 
study for tbe Committee on Qovesmsm. Opatuioos, U.S. House of Rq>resetjt»tives 
(Waitiinft(»i: Oovenuaent Priotiot OffloN April 1967), p. I!2. 

21, BenmeiQ and Fraenuu, pp. 99-134. 
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officials can reasonably and usefully raise. While the tensions do 
not always pull in exactly opposite directions, they constantly tear 
at work in process, diffusing its resources and objectives and often 
weakemng its intdlectual and practical impact. Because thwe is no 
single objective to gov«nment evaluations, no constant order of 
priority among its many objectives, and no valid and ratable way 
to differentiate among contractors, no one contractor or type of 
contractor is "best." Academic, nonprofit, and for-profit 
organizations each manifests a great range and variability in the 
quality of performance. 

Methodological competence is often considered a strong point 
of academic evaluators. They are considor^ well qualified to 
concepttialize the goals of evaluation and to understand the 
theoretical underphming of social prop^ms and the forces that 
make tiicm work or keep them from working." But. 
metiiodological comp«ence does not assure tiie uscfulnws of 
findings. This may not be a swious weakness in academic circles, 
but from the public policymaker's point of view, it is crucial. It is 
hard to justify a useless study 

Methodological adequacy is a necessary but insufficient 
condition for effective evaluation. For managemait and efficiency 
studies, sophisticated metiiodolo^« are often unnecessary." 
While evaluation is more sophisticated today than in the early 
1960s, it is not because of any profound conceptual progrcis. 
There have been no theoretical breakthroughs in understanding 
the social and intellectual problems which evaluations address. 
The breaktiuroughs have been strictiy technical, primarily in the 
ability of computers to proc«s and manipulate vast quantities of 
data. As one observer put it, "We've had an engineering advance 
but not a scientific advance."" 

22. Bentstdn and Freemts. pp. 83-98- 

23. WfflUun A. Morrill iad W«!toii J. Fnmdi. "Eviluitton from the HEW 
PT Tf^^ " «««^« pwimted ai the Federii Executive Imiitutc Workihop on Propwn 
MuafOBast. May 3, 1976, p. 4. 

24. Clark C. AtK, "The State of the of ProfTim Evaluation'* 

and Program Evebiatiim, pfoceedint* of a icminar orgwdad for the U.S. Setass 
Committee on Oovenunent Oporatloiu (Wa»hiniloa: Goverameot Piintini omce, May 
im p. 314. 
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A performance critmon that sometimes conflicts with 
m^odoiogical atkquacy is timdin^. How qukkly can a 
perfonner conduct an evaluiuion; how well can he meet a 
deadline? How well can he baian^ the quality and timdinm of 
information? Greater certainty costs time and money. The 
collection, analysis, and presentation of eddied information raise 
costs and delay decisions. An ac^mplished performo^ must make 
many judicious compromises between the pressing demands for 
timdiness and for precise, reliable, and convincing information. 

The dmand for instant evaluations may jeopardize method- 
olf^cai adequacy by restricting the time for longitudinal studies 
or experimentation. It may serve to reduce sample sizes, to inhibit 
painstaking and meticulous procedure. The New 3&sey 
Graduated Work In^tive experiment encountered just these 
kinds of problems when political pressures forced a premature 
presentation of fmdings. Most evaluations are ' 'bject to similar, 
although less dramatic, pressures. 

There is no empiriod evidence on the point, but profitmaking 
organizations claim to be the most dependable in meeting 
deadlines because that is their bread and buttfi*. A^demidans are 
presumably more loathe to cut metho^lo^»l comers, because 
they are more interested in presenting a weU-documented case than 
a quick and duty analysis that relis on ixmiition and judgm^ as 
much as on hard, em{^ca! data. Academia assigns greater value 
to careful, time-consuming research than to less careful, if more 
tiffidy work. The payoff in the halls of academe in pr^tige and 
honor is for rigorous research which serves to advance a scholarly 
disdpUne rather than research meeting the practical needs of 
government. One analysis done as a follow-up to a congressional 
sttidy reported that "among the kind of research singled out as 
inappropriate for universities were projects . . . providing quick 
answers. . . Given the time pressure on many evaluation 
studies, it stands to reason that the academic role might be limited. 

Social prognuns cannot be evaluated adequ^y by the conc^ts 
and mdhods of a single academic disdpline. Ndth« the effects of 
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the programs nor the problems they ^apple with are 
unidimensional. There is more to unemployment than deficient 
aggregate dnnand or a mismatch between sldlls and available job 
opening. A variety of factors— political, sodoIogi(»!. psycholog- 
ical, historical, even medical and physiofd— also come into play. 
Wdfare depenckncy is more than a sociological or psychological 
phenomenon; there are hard economic facts to consider. Housing 
problems are similarly the products of a ^nstellation of 
interlocking forces. Bccp.use the problems are not one-dimen- 
sional, their adequate evaluation and r^lution cannot be 
one-dimensional either. An ideal evaluation should conceptualize 
research and analyze all the intmelated aspects of a program and 
then design the elements of a new and b^ter program. It should be 
rounded— or, in the parlance of the trade, the product should be a 
multidisciplinary intdlectual enterprise. In practice, few evalua- 
tions achieve these goals and program administrators normally 
settle for less. Th^e have been few renaisssmce men and women in 
the evaluation business. 

For-profit organizations are fairly well statfed and organized to 
bring a breadth of expertise to bear on an evaluation proj«rt. They 
are not hii^ered by an »»demic (or govormnentai) structure that 
int^eres with picking up (and dropping) spedalized assistance 
fairly quickly. That gives commo-dal organizations flexibility. In 
contrast, at^demic organizations are likely to favor a narrow 
disdpUx^ approach. The basic c&vtse of this rigidity that may 
pass for rigor is, again, the structure of academic rewards and 
incentives. There is a premium on specialization and pulling a 
particular discipline or subdisdpline to the limits. Academic 
departmoits, associations, and journals are similarly oriented. An 
evaluation of a social program that is methodolo^cally sound, 
even if useless for policy formulation, is much more apt to be 
published in an academic journal than a more balanced ass e s s m e nt 
that examines several dimensions of the ptos^am*$ effectiveness 
which affect social ami economic policy. This sovereign discipline 
mentality encoura^ a rij^d 8ppr<Mch to the acad e mic study of 
social programs and contributes to the gap betweoi conclusions 
that are conceptually sound and those that are politically and 
administrative^ feasible. 

ERIC 
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As a result, traditionai a cade mi c performers are normally not 
held in high esteem by executive equation ofHcers. Most prefer 
instead to do business with nonprofit and profltmaking 
organizations. Labor's ASPER, oriented more towards research 
than evaluation, has rdied on academicians for evaluations. In 
some respects, its ispaience with them has left mudi to be 
desired, contributing to ASPER's difficulties. They have tended to 
force evaluaticms into narrow, fashionable, and artificial channels 
which oversimplify or ignore the m^or dimensions of programs 
lying outside their disciplinary blinders. More recently, ASPER 
has departed from the narrow and rigid evaluation empha^s, 
enhanc i ng its effectiveness in the policy arena as a result. 

Serious problems have also been encountered in the putative 
ability of academicians to extaid the frontier of undostanding of 
today's society and the govemmoit's roles in it. The highly 
specialized nature of evaluations and research makes it difficult to 
achieve a progressive improvement in our understanding of the 
effects of social programs. For example, sophisticated analyses of 
the factors a^ecting labor for^ participation go off on parallel 
lines that never intersect. And while the sociological or economic 
aspects of the problem may be explored in much detail, a unified 
explanation that might be useful in formulatiiig comprehensive 
new policies eludra the analysts. 

Seardiing for a midcfie ground between relativdy superi^cial 
analyses of many proi^tmaking groups and the narrowness of 
a^ciemic work, some observers turn to research institutes houang 
specialists from dlff«%nt disciplines. The assumption is that by 
bringing diverse specialists together to concentrate on a particular 
policy area, both depth and breadth can be achieved. But, the 
assumption has been qustioned. 

Interdisciplinary research . . . bears somewhat the same 
relation to the world of the mind as the idea of "man" 
or a global society bears to the world of nations: an ideal 
infrequently realized. Ralph Linton once remarked that 
the only genuindy interdisciplinary thinking took place 
when two disciplines were united in one mind. . . . But 
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to assemble under one roof scholars from many 
<fi«;!pfi«iff does not ^cessariiy bind thdr imowtedge 
together any more firmly than separate papers are 
bound together in a book. Oouune intellectual 
integration of different disciplines remains rare, and the 
more disoptines tl^at must be int^ated, the rarer it is." 

Many close obsetveis believe that the most important f a^r in a 
successful and influential evaluatioa Is the evaluator's e;gwrience. 
Repeat perfonxKrs know the standing of a program in a 
department and the Congras, the rdative importance of different 
program s^ces and constimenek^ and the possible conse- 
quences of their findings. They are also familiar with agency 
personnel and the intrlcads of bureau^atic politi(»« and thdr 
^ykg power may indicate a degree of c(»nmitment to ^rtain 
policy areas. The success of repeatm may therefore also reflect 
thdr ability to condstently second-guess the biases of the proposal 
revie^^^. 

Assuring continuity in evaluations is problematical, though. 
For years, the HEW Office of the AssiMant Secretary for Planning 
and Evaluation us^ sole-source contracting to achieve continuity 
'm evaluation. The Office of Policy, Evahiatk>n anr* !:.=«carch in 
Labor's Employmoit and Training Administnition still us^ 
sole-sour^ contracting, justifying it partly on ttm grounds that it 
contributes to a smoother accr^on of knowledge. Tl» Ofil^ of 
Evaluation «r«'"«g«' to have a birge volume of its evaluation work 
done by repeat performs. Contract bidding is certainly not 
'*fixed;" the repeats' success is based on thdr con^stent 
sttbmissioD of better proposals. Rdatively few bidders compete 
for the office's small contracts—less than 40, on average, 
compaitd to over 100 tiutt many HEW offices recehre for each 
RFP. important evaluations witii more tiian one phase, or 
m iffffm^fi of pilot programs, a^mdes s(Ma«imes negotiate 
solMoisce cimtracts for later phases if the contract!^ does an 
adequate job on the first. Many eduouional and income 
maintettaaee evaluaticms have been conducted tills way. 
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It is hard to get a consensus among evaluators as to who can 
best institutionalize, or at least foster continuity. The argument 
that for-profit organizations are not the most suitable to sustain 
continuity may hold more true for small, struggling firms with 
high staff turnover than for large, more firmly established ones. 
But even where turnover is minimal, the same staff may not be 
continuously assigned to a long term project. Firms often use their 
mc»t prominent staff to prepare prop(»als and then assign less 
experi«ic«i personnel to conduct the work. However, the same 
phenomenon occurs at univmities and r^eardi institute. To get 
their money's worth, knowledgeable evaluation officials focus on 
the personnel to be assigned to a specific study rather than on the 
overall reputation or "classification" of the orpmizadon. 

Thb Indefbndbncs of Evaluatom 

An important con^deratlon in appraising and utilizing the 
findings of evaluation projects is tl» d^ee of independence of 
the evaluators. Some see this as the chief ctetominant of reliable 
evaluation— more signiHcant than methodologi^, credentials, or 
vantage point. The argummt is that only an investigator with no 
vested Interest in the fuidiogs can conduct a dispassionate inquiry 
and reach an objective conclusion." 

The credibility of an evaluation by an agency r^ponsible for 
administering the program is a r«:urring thane in dkcussion of 
federal accountability. Congres^onal action on the Legislative 
Reorganization Act of 1970 and the Congressional Budget and 
Impoundment Control Act of 1974 reflected this com»m. 
Congress wanted ass^mmts by evaluators who were perceived to 
be objective and independent of program administratorf . That 
wish umkriay the new emphasis on program evaluation by the 
General Accounting Office. The feeling was that QAO anal)^ts 
reporting directly to Congress would be more responsive to the 
needs of the Congress in carrying out its oversight functions. But 
the entrance of GAO into sodal program evahiation has not 

27. MkhKl Saimi, "Evaluatk» Bias aod lu Comroi." EvaJuatkm ^ttik$ Rtvitw 
Aaimal, V. GUu. cdUor (Bevsly HiiU, CA: Sage Publkatiosu, 197fi), p. 120. 
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disposed of the objectivity issue. Even if GAO were technically 
equipped to undertake evaluations of social programs, it could not 
conduct all the evaluations that are needed by the Congress, let 
alone executive agencies. 

Although in-house evaluation is not ne^ssarily biased, it is not 
the most prudent policy to leave the fox to guard the hen house. 
With self evaluation, the appearance of objectivity is hard to 
sustain. Evaluators are subject to gross and subtle institutional 
and programmatic pressiures. Daniel P. Moynihan stressed the 
difficult position of in-house evaluators, suggesting that objective 
evaluation of social programs is a ^ntradiction in terms: 

The conunitment to evaluation research is . . . funda- 
mentally ambivalent; one of attraction and fear, trust 
and distrust. This is so not only because research of this 
kind can blow up in an administrator's face when it 
turns (Hit his programs show little or none of the effects 
they are supposed to achieve, but more importantly, 
because in areas of social policy, facts are simply not 
neutral, however much we would hope to treat them as 
such. In soda! science 6a.t& a/e politick." 

Moreover, many agencies lack the necessary qualified staff. The 
logistics of conducting evaluation work with limited staff 
resources dictate that comers be cut. Sampling has to be r^tricted 
and heavy reliance must be placed on guidance and data provided 
by program operators, raising again the question of cr^bility. 
Moreover, experioice so far with in-house evaluation shows that 
results are often not disclosed to the public. This may be justiHed 
when the study involve some of the flno- points of management, 
but certainly not when it discloses major program failinp. 

In view of its obvious limitations, little in-house ev«huition is 
conducted and the trend is to eliminate it altogether. Almost all 
HEW evaluation work and the majority of Labor's are conducted 
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by outside contractors. Contracting for outside brains ana hands 
does not eliminate conflicts of interest, but docs put a certain 
di^ance between the person whose ox is being gored and the 
pason whose ox is ddng the goring. Muue focuses on which 
perfonners are most likely to be objective. Some a^e that 
for-profit organizations are subservient to program administrators 
and whoevtf dse butters thdr bread. Tliey will lick and not bite 
the hand that feeds them. As one observe noted: 

[A] profit-making organization which dthor has a 
comm^dal intoest in a particular produa or conducts 
research for a firm with sudi an hitorest fac^ a conflict 
betwmi the dispassionate pursuit of knowledge and tte 
danger of uncovering truths harmful to that commercial 
interest; it is likely to concentrate its attrition on 
profitable truths.'* 

But in this respect, too, the distinction between for-profit and 
nonprofit performers may be more illusory than real. All groups 
want a roof over thdr heads and money for groceries. The issue 
centers, therefore, on whethff a performa* wants an encore. 
Faculty do not depoid on contracts, skice their basic income 
derives from their teaching, though they benefit from them. 
Hence, they have a greater degree of indqiractence firom fed^ 
sponsors tlmn the staff of or^nizations which subsist primarily on 
fecteral funds. But the price for the indepoidatM of faculty om be 
steq). Ad hoc investigators lack the familiarity with a program 
that continuing association produces and which is helpful or 
necessary to render an evaluation realistic and useful. Further- 
more, professors are no less subject than the staff of private 
organizEUions to ideological biases and the conviction that one 
m^hodokjgy or theory is consisteitly sup»ior to another. 

"Indepentki^" and "objectivity"-'"ttaivetc" may be a more 
accurate word— bear costs: imfandliarity with and iniensitivity to 
program personnel, dioitele, and operational conditions; 
uncertainty about contract performai^; and ignorance of 
political and administrative realities. The more control govm- 
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meat staff 'xerdse to keep evaluations relevaiu, the less 
independent the evaluator will be. An evaluator with complete 
independence from the sponsor is also Ukdy to be completely free 
of contractual obligations. 

It is reasonable and proper to ask if the cost of much 
indepmdence and objectivity is too high. A persuasive case can be 
made that the fed^ ex^tive branch has shirked its direct 
responsibility for evaluation and wrongly allocated the staff at its 
disposal to less important duties. Granted an ind^smdent 
assessment of programs is iw^sary; but this docs not absolve 
administrators from conducting their own assessments of the 
efforts that Congress entrusted to than. 

Like everything else assodated with evaluation, there are no 
simple solutions and (%rtainly th^ is no single apprc^ch to the 
selection of performs for different kinds of evaluation. The 
minimal requirement of a good evaluator is a thorough 
understanding of the social programs and polid« untter study and 
the context in which they operate. No one can gain that kind of 
understanding ouxpt by pmonal involvonent or close and 
continuous observation. The former condition caUs for in-house 
evahiators; the latter, for private evaluators who have made a 
personal commitment to tl» subject. 



Can Eyaluation 
Make a Difference? 



Cbb Nod mk CsofiDWom Aiifhotnt 

Fedovl otttlsys swpsssed the half triQioii dollar mark in fiscal 
year 1979. Transfer paymo^ alone accounted for i^arly a sixth 
of ail the disposable iiu^me available to the Ame ric a n people. 
Although this expansion of federal respmisibiUties has not been 
universcUy acclaimed. It k irrefutable that Uie government is, 
becoming increatlngfy Important in om daily lives. In a 
democracy, it is now more crucial than ev«r that the citizenry, not 
to moition the Presidoit, Omgr^, and public officials, be able 
to assess the Impact of governmental activities. Rising concerns 
ovtt govemnwnt credibUlty maise It especially important to 
establi^ apj^opriate means of finding out what the govemnKnt's 
diverse missions are and how weH it b accomplishing them, and 
that these findings be reported to the public. 

Agency reports and press rdeases convey glowing, superfkial, 
or sopcnific accognts of govems^stal operations, but rardy a 
candid, rounded, <ff realistic one. In the 1960s, the fa^n was to 
proclaim that eveiy new social eff^m was on marie and 
contributing to a better society. In the succeeding decade, 
excessive ist^mises have been replaced by a pervasive nervism, 
with the usual pronouncement being that social programs have 
failed. Presidem Jimmy Carter was dected on an and-Washington 
platform, idthough onoe in office, he has followed the path of his 
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predcMssors in supporting existing programs and adding some 
new ones. It is no surprise, therefore, that the public has been 
confused about ti» true imp^ of government social efforts. 

In the private sector, a market test detmmines if a commodity or 
company is successful. According to conventional economics, the 
consumer is supreme. However, the market test measures only the 
costs and income of a company, not the nation. It effectively 
ignores r^ social costs that accrue to consumers and the public 
whkh are not reflected in market pri«s and production «}sts. For 
example, the price of a car do^ not include the costs of 
automobile a«^dents. Pr^umably, a safer car could be built, but 
at a mudi hi^» price to consumers. The market test has its flaws; 
but for the purposes for which it is intended, it is an incisive test of 
the quality of products and services. 

That kind of test is not applicable to government programs. The 
closest possible approximation is whether Congress will "buy** a 
program and how much it viU "pay** (appropriate) for it. If a 
program is refui^ed annually, it is viewed as a success. This 
version of the test is nei^sary but inad^uate for detotnining the 
'*success*'~the social costs and beneflts— of a government 
program. What is needed is a more rigorous test of merit and 
effectiveness. Congress rarely has adequate data to determine 
whether a given project or program is achieving its predetermined 
goals. Faced every year with the decision of whether to continue a 
program, Congress too often depends primarily upon past funding 
levels for setting future appropriations. Of course, Congress may 
also respond to the pr^ur^ of program supporters and oitics. 
Objectivie analyses of what a program does and does not 
accomplish figure only marginally, if at all, hi the d^isions. 

The limited use of evaluations in policymaking has been due in 
part to the dearth of definitive and pertinent assessments of 
government programs. Until recent years, this lask was not felt 
that strongly. As long as federal expenditures were rising, most 
social programs were virtually guaranteed continued, if not 
hscreased, appropriations. But those fat years may have encted, 
and there may no iong«' be something for everyone. Difficult 
dedsions will have to be made. The new congr^sional budget 
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process that forces exprnUtuns to be considered in comp^tion 
with one another, taxpayer* dis^ntent, and the inflationary 
pressures gensat^ by huge federal deBdts have all focused 
policymakers' attention on the need for ranking and sdeetsng 
programs which merit support and curtdUng those that fall short 
of the mark. The threats of retrenchment make it imperative to 
(tevelop appropriate data and criteria upon which administrators 
can make <kcisions in their efforts to enhance the efficiency and 
effectiveness of sodal policy. 

Conceptually, the solution is simple. The most effective 
programs should be continued or ex]»mded and the least effective 
should be axed. Regrettably, le^slators and administrators find 
that the necessary data are almost never in existence and whatever 
evaluations are available are rarely, if ever, definitive. Desi^te the 
pressing demand for useable program evaluations, they are not 
forthcoming. Necessity may be the mother of invention, but it 
cannot create the supply when serious conceptual and technical 
obstacle are in the way. 

OltTACUBf TO THB USS Of EVALUATIONS 

Administrative, substantive, and methodologi^ problems 
con^ually stand in ibe way of evfUuaton charged with reviewing 
sodal initiatives. The administrative probl«ns are a product of 
bureaucratic in-Hghthig, turf protection, and power poUtics, while 
the substantive and methodological difficulties are inherent in the 
nature of sodai pro-ams. 

The prindpal administrative obstades stem from the unwilling- 
nes5 of program officials and employees to have critical 
performance data publicized. Controlling access to information 
about program opmtions and results prevents noddling by 
outsiders, be they profosors, repr^entatives of the public, or 
legislators who are responsible for the aj^oprimion of ftmds to 
continue the effort. Another administrative dilemma arises in 
choosing an evaluator. Depending on the purpose an evaluation 
must serve, the product might be needed to provide a thndy 
report, reflect an outside opinion, serve as a diagnostic tool, or 
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bring in elements of quantitative rigor. No single evaluation 
cais serve «U purposes. 

Agreeing on standards for measuring program a^mpUshm^ts 
presents a tremendous substantive problem. Progress is difficult to 
measure when the destination is hazy and the start of the journey 
unmariced. The direction of social efforts is seldom straight- 
forward* and unclear l^slative mandates usually muddle it 
furtl»r. 

A chronic methodologxcai impediment to objective evaluat n is 
the difficulty of obtaining control data. Controlled social 
experimentation can be prohibitively expensive, ethically unten- 
able or an inadequate guide :o ^ ^ :s on a national scale. Without 
control groups though, an evaluation can draw only limited 
conclusions about tte net impact of a program. 

Evaluators encounter other methodological barriers when they 
try to determii^ how effectively resources are l^ing used in 
differm propam approaches. Benefit-cost and cost-effectiveness 
methodologies are vulnerable, first, because of their reliance on 
the efficiency criterion in evaluating expenditures. The question 
*'Who benefits from social spelling?" is indeed as important as 
"Does society experioice any net piins?" Measuronent problems 
associated with these approaches also compound the methodolog- 
ical difficulties that league social evaluations. The simple 
beneHt/cost paradigm is m«te ffossiy defldent by the presmce of 
a host of non-quantiflabie, non-measurable costs and benefits. 
The models are either burdened with explicit qualiHcations or 
ignore so many important variables as to be usel^, reprinting 
nothing more thui hollow exercises and intellectual gEmes. 

Given the inherent impediments, the question arises whether the 
assosment of social programs can help to ^ntribute to the design 
of sound public policy. Military hardware evaluators can 
unequivocally test the cte^ructive ^pability of a weapon system. 
But social i^ograms cannm be assisssed as predsely since the 
essential criteria are not always readily observed or measured. 
Contrariwise, in social program assessment, there is a tendency to 
attach excessive importance to criteria which can be measured. It 
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is frequently difficiilt to specify program objectives and it is 
equally hard to track down all program effects, for tl» latter 
^iread out Hke wav» in attenuating drcles. Hmee, the mott 
thorough and consdentsous an evaluation, the less likely is it to 
yield sharp, definitive conclusions. As one critic has remarked, 
"the obstacles to scientific evaluation of retraining programs are 
fundamental and serious. Even a well-concaved and executed 
study . . . does iK>t make a convincing case that training programs 
affect fisnploymoit at all.'" The critkdsm, to be sure, is no more 
"sdoitific" than the daims of achievonents made by program 
advocates. 

M^odologists trying to promote quantitative techniques have 
discovered how dusive certahity or even objectivity can be with a 
model that does not ad«iuately reflect reality. Supposedly 
objective studi» of vocational rehabilitation programs have 
produced ratios of benefits to costs ranging from less than 1 to 
nearly 100. Other evaluations of the same program and data have 
reported variations in ratios of more than 40 percent, depending 
on the choice of discount rate^-an issue which is itself subject to 
an endless debate without any ckar resolution. "Rigorous" 
analyses of Job Corps data luive yielded similarly confusing 
results.' 

The other side of the com shows that when the evictence of 
success or failure is clear and convhidng, the matter is already 
self-evident. Evaluators do not make their observations in a 
vaoium or from a unique vanta^ point. They often rely on 
operational data already known to program administrators. A 
resounding success or failure does not go unnoticed unless 
program oftidals are engaged in a full-scale a>ver-up or 
c^sponents are indifferent to the effects associated with the 
program, an unlikely occurrenos in either case. The on-the-job 
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training component of employment and training programi was a 
dear success; officials ki»w that before the evaluators presented 
their airtight case. The evaluations were important to indicate the 
dimensions of success and provide some explanations for it, but 
the success surpri^ no one who was familiar with the program.* 
Ilie contributiott of evaluation in that kind of situation is 
incremental. 

Tbe LnoTEO EsFfiCTs OF Evaluation 

Most evaluations have, at best, only modest effects on the 
development or refinement of policies and the implementation and 
administration of programs. The d^ee of agreeoaent between 
evaluation Hndings and prevailing policy is often the most 
important determinant of how Hndings are regarded. Institutional 
and political inertia* manifest^ in the form of traditional valu^ 
and long-standing policies, exerts a powerful hifluence upon the 
unpact of an evaluation. An evaluation of a popular program that 
can be used to justify additional appropriations has more 
"impact" than one that criticizes it. The converse is also true: 
whoi an unpopular program is a liability to an administration, a 
critical evaluation may become a convenient rationale for 
action^the straw that breaks the camel's back. In short, existing 
values and interests, more than anything else, dictate the ultimate 
impact of sodal program evaluations. 

The Job Corps experience is illustrative. This program was a 
controversial, albdt small, piece of the Great Society mosaic. It 
represented a comprehensive effort to provide a second chance to 
deficiently educated, unskilled youth from debilitating homes and 
poverty backgrounds. A thorough and expensive program, it 
c^tured much of the driving spirit of the Great Society's efforts. 
The Job Corps was an experimental program involving a host of 
unknown variables. It was assessed from its much ballyhooed 
beginnhigs, and the evaluators reached as many conclusions as 
there were studies. The decisions to continue the Job Corps were 

3. NatkMuaCcMoeil on Bmployiaeiit Polky, "Tbe ImpKt of EmploynicDt and Trtlnini 
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based, to some extent, upon these evaluations, but to a larger 
extent upon the conviction that the program was moving in the 
right direction and would show later returns. And as long as no 
viable alternative emerged, the antipoverty warriors clung to the 
Job Corps concept. 

the NixfHi adoiinistratioii, relyhig on the same studies, cut the 
Job Corps dra^icaily. But new factors were at work. Nixon had 
made the Job Corps an issue in his presidoitial cami^ign, 
contending that Job Corps enrolled could take their destiny mto 
their own hands. No evaluation could musts' convincing evidence 
against that contoition, which was taken, therefore, as prima fade 
evidence of the program's failure. 

Follow Through is a useful example of the conflicts that arise 
when program results and policy disagree. As a compensatory 
preschool program for children from impoverished homes, its 
purpose was to serve as a follow-up to Head Start, retaining the 
that Head Start' diildren made but then lost in traditional 
classroom settings. Despite the good grades that Follow Through 
revived in repeated evaluations, both Presidents Nixon and Ford 
wanted the program cut — and succeeded. Policy had once again 
preempted observation. 

An even more apparent case where evaluations were planned 
and used to support administration policy involved the hou»ng 
programs in the early 1970s.' After adopting a policy drastically 
curtailing f«l^al support, the Nixon ^Iministration prepared an 
evahiation of the previously existing programs, apparently to 
provide an intellectual basis for the administration position. The 
Congressional Research Service responded with a critique 
charging that the administration report failed to present clear 
evidence of either the program's succ^ or failure and could not 
justify the administration's drastic action.* 
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These head-on conflicts dearly represented more than heated 
excban^ of assertions. In both, evaluation methodologi^ w^e 
founded on preconceived notions that af fe(^^ the results. Under 
circumstances such as those, the very phrasing of the question to 
be investigated may affect tbs conclusions of the evaluation. In 
assessing a social program, one possible hypothesis is that it 
works, requiring contrary evidence of failure. The converse 
hypothesis would assume failure in the absence of proof of 
effectiveness. Using the same evident, dissimilar results can be 
obtained from the two hypotheses, because different evaluation 
standards may be utilized when measuremoits are Impredse, goals 
indetoninate, and proofs equivocal. If clear evidoice of success is 
demanded to discount an assumption of failure, a positive verdict 
is unlikely; if absolute failure must be demonstrated to alter the 
assumption of sucosss, a positive judgment is inevitable. 

Ths Evaluatok's Rou 

The inherent limitations of evaluations notwithstanding, many 
practitioners continue to promote their trade as an integral policy 
tool and a few zealots will pronounce it a panac^. in an effort to 
stimulate and expand evaluation, the Office of Management and 
Budget tried to prod fedo^ agencies . . to systematically 
analyze Feds'al programs (or their components) to determine the 
detent to which the programs have achieved (or are achieving) 
their objectives.*'* The dir^ve added that "program evaluation 
should be undertaken for the express purpose of providing timdy, 
relevant, accurate information concerning program parforman« 
that is oriented to a policy or program-related ded^on.'" This 
meant that the justification of program evaluations would r^ 
heavily upon tl» Ganges mid improvnsents attributable to them. 

The exhortation did not produce the desired results. However, it 
did reflect the confusion and frustration of phming down the role 
of evaluation. The 0MB flat took a dmpUstic approadi to a 
complex problem. Program evaluation has broader objectives 

& "Evsluadoo Minnmcnt; A Backfround Paper," U.S. Office of M i m n emcnt and 
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than **bottoni U&e*' answers. It is inextricably associated with 
program operotioas and poUcy analy^ under conditions that 
cannot be uniformly and neatly defined. 

At congr^onal ovo^ht and appropriations hearings, 
inquiries are frequently raised about the succ^ of evaluation. 
Evahiators respond with **aiuse and effect" examples of where 
evaluation made a difference. Hiese purported examples of 
evaluations influencing specific policy or program decisions are 
practically limitless when en appropriation committee has to be 
convinced. Considering the premium put on the **rdevance" of 
findings, evaluation mamgers are only too happy to find cases 
justifying their work. 

However, evaluating evaluations is a hazardous pastime that 
can quickly run afoul of the same methodological problems 
plaguing evaluations of social programs. The central problem, 
analc^ous to the control data problem, is to determine what would 
have hf^pened if an evaluation had not tmn done. The selection 
of control groups made up of individual efforts is di^cult and 
makes the study of conuasting eff^ of evaluations and 
non-evaluations on bureaucratic and politic behavior extremely 
fnmrating. In practice, the proem of assessing the impact of 
evaluations is no mors than a guessing game. Like the claims of 
sodal policy framers, auctions about the usefulnKs of 
evaluation, by those who conduct and sponsor it, reflect optimisn 
more than valid evidence. 

Discounting v^ted claims it remains doubtful what a 
satisfactory norm would actually be, even if the predse impact of 
evaluations could be measured. Few evaluators believe that their 
findings should be the only basis of program and policy decisions. 
A former Department of Labor o^icial responsible for evaluation 
cautioned that there is no single, correct assessment, but rather 
that different evaluations will employ "different methodologies, 
diff^ent data sets, differ«it political approaches. Everyone has a 
bias regardless of how wen or rigorouky trailed he is." Because 
the process of assimilation is usually slow and subtie, 
**. . . evidence has its effect through a gathering of a pre- 
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ponderant weight of information. . . .*** Another cvaiuator, a 
former HEW Assistant Secretary for Planning and Evaluation, 
suggested: 

Other ingredients are jKjlitics, value judgmeits, manage- 
ment and other data, dtuational factors (are thse in fact 
live alternatives?), considered reasoning, common sense, 
and, cn occasion, use of systematic analysis techniques 
such as simulation.* 

Evaluations (annot be the only "facts" bearing on decisions 
because the available methodologies are not— and will never 
be— sophisticated enough to cs^ture ali the rdevant wiabl^. 

Some evaluators sen^tive to questions about the relevance of 
their work to policy formulation have tried to gear it better to the 
needs of officials. But in doing this, they have run the risk of 
niling an information void with misinformation. The number of 
cases in which the impact of an evaluation was justiHed by its 
findings can probably be matched by an equal number in which 
the impact was greater than the fmdings warranted. 

This phoiomenon has bem viewed by one promineit observer 
of sodal pro^nams 3S evidence of the substandal potential for 
abuse that exists in taking findings too seriously: 

Evaluation is being used as a decision making tool more 
than it warrants. ... To use evaluation results for 
policy-making ... we need to be able to separate fact 
from artifact.'" 

A 1971 evaluation of Head Start by the W^thighouse Learning 
Corporation and an Ohio University group is an example of an 
evaluation th^ had much more impact than either the evaluators 



S. IntmiewwithEnmSknHiudorfer.'TbeUieofEvsluttioo 
SynpadHm Woriubc^ Conducted by the Mbre Corporate (WtibiQiioa: The Miov 
CnpontiOB, Novoiibtf pp, 3-4. 

9. WiUiim A. MoRill and Wtltoe J. Frandt. "Evilaadoa from tbe HEW 
P«npeeth«," remirks preiMued for the Federal Executive Inttitute, Worieshop in Pnqpam 
Maoifencst, Chaf tott emilk, VA, May 3, 1976. 

10. Sdma 1. Mtt^kio. "EvahiaOcmt: Uie With Csutit»," Evebtation, Vol. 1, No. 2, 
1973. p. 3i. 
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intended or the evaluation warranted. It sought a quick measure 
of the program's long term effect on its participants. The motives 
for the evaluation were not dear to aU persons involved. Some 
thought that it "was one of a of evaluations systematically 
identified as part of a larger plan." But defenders of the program 
perceived a plot **to fmd a way to kill Head Start or to mutilate 
it.**" Coupled with the uncertainty about the underlying 
reasons— if there w«re any—for the evaluation, were in«)nsisten- 
dss in the choice of objective it was measuring, the purpose it was 
gomg to serve, and alleged shortcomings in the d^ign. The 
evaluators wanted to measure accompli^mients with respect to 
objectives that the antipoverty offickls never consider«i. The 
sponsors of the study saw the need for an evaluation that would 
support a straightforward y» or no dedsion on program funding. 
Program officials favored a diagnostic evaluation that would 
indicate effectiveness and also provide insights as to where and 
how performance could be improved. Critics attacked the design 
on methodological pounds. 

Yet, in spite of the serious design failure, the shortcomings in 
the aimlysis of the program's ef fectiv«iess, and the fact that most 
recommendations did not derive ^m the evidence coll^ed, the 
Westinghouse/Ohio evaluation widded considerable influence by 
bolstering the biases of program detractors. As early as February 
1969, President Nixon began hhsting at the poor long t«in results 
that prelimmary findings were implying. The r^ults were also 
used to fud debate about the fate of other antipoverty programs. 

The Westinghouse Report came along at a convenient 
time to shake conHdence in OEO*s ability to manage 
successful programs and to dampen public hope in 
family or child edw^tional interv^tions as an effective 
way to reduce poverty. . . 

In addition to methodological inadequades, administrative 
barriers, substantive ambiguities, the underlying presumptions 

!i. Lois-dlifl Ehma, "The Impict of the WestinftKHi$e/Ofaio Ev«lu«iiNi cm the 
Development of Project Hcftd Start," in The Ey^mtion of Social Pwgrams, CUrk Abt, 
editor (Beverly Hilli, CA: S«|e PttbUc^ni. 1977), pp. 131-132. 

12. Ibki.. pp. 160-161. 
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and bias^ of evaluators, and the preferaices of policymakers, the 
im{>act of evaluations can also }x negatively affected by tmrealistic 
expectations about what can be proved. Expectatioiis are raised 
because of the success of pro^«m evaluation and r^ted 
techniques for systematic analy^ outside the soda! sci^ces. 
Policymakers and program officials, encountering difficulties in 
making their own assessments of program effectiven^, grasp at 
straws, in the hope of greats success from **sdentinc" 
evaluation. But unlike military hardware development, space 
exploration, or biomediod testing and research, government social 
programs possess a large central element of human behavior that 
remiers their results unpredictable and rapidly changeable. They 
are crude and uncertain interventions into complex social and 
historical developments. Yet in spite of the dissimilarities between 
the physical and social worlds, expectations persist that evaluators 
of social programs can achieve the same predsion as laboratory 
scientists. 

To paraphrase PhUlip B. Crosby: Evaluation has much hi 
common with sex. "Everyone is for it (under certain conditions, 
of course). Everyone feels they understand it (even thoi^ they 
wouldn't want to explain it). Ev^one thinks execution is only a 
matter of following natural inclinations.*"* 

It's Not Pbubct, Btrr . . . 

The inherent difficulties imply a great sense of uncmamty in 
evaluations of social programs. That is no reason to stop 
evaluations, but they should be viewed with a degree of skepticism 
and used with a degree of caution. Even persuasive findings do not 
simplify decisions and certainly do not dhninate debate. But 
evaluation remains necessary and helpful. Indeed, policymakers 
must determine which programs to support, to modify, and to 
discontinue, and they med relevant information that will help 
them make sesisble decisions. This calls for the continuing 
collection of relevant data and ther analysis. 

13. PhUUpB.Qotby, QuaUty it Fn^ (N«w Yof k: McOriw Hifl, 1979), died la flufrtm 
Wmk, Mards 12. 1979, p. 10. 
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Those who sponsor, prepare, or use evaluations should not 
delude themselves that government decisions can be made on 
purely objective, "scientific*' grounds. In the final analysis, they 
rest on personal judgmmts. no matter how many statistics are 
furnished and how good they may be. The 19th century British 
economist Henry Clay said that statistics are no substitute for 
judgment. His admonition is as true today as a century ago. 
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